C#: Fastest Way to TRIM strings
This will benchmark various techniques to determine in C# .Net: Fastest way to TRIM strings.
Removing all the whitespace at the beginning and/or end of a string. Every programmer at some point needs to do this. The majority, if not all, programmers typically use a built in library or method call. Why not? Simple. Elegant.
But that’s when this Curious Consultant started wondering in C#: what is the fastest way to trim strings?
The Set Up:
I wrote a C# Console application to test multiple techniques.
The code is written in Visual Studio 2013 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system if you wish.
In a nutshell, the code does the following:
- Creates a specified number of strings (1,000, 100,000, 10,000,000, 50,000,000), both 28 & 128 characters long, with a random amount of white space on either end.
- Uses one of the techniques below to remove the white space from both the beginning and end of the strings:
# |
Technique |
Code Snippet |
||
T0 |
|
|||
T1 |
|
|||
T2 |
|
|||
T3 |
Char.IsWhiteSpace + Substring() using while loop |
|
||
T4 |
Char.IsWhiteSpace + Remove() using while loop |
|
||
T5 |
Char.IsWhiteSpace + Substring using for loop |
|
||
T6 |
Char.IsWhiteSpace + Remove() using for loop |
|
||
T7 |
Regex Compiled |
|
||
T8 |
Regex Not Compiled |
stringsTrimmed[x] = Regex.Replace(s[x], @"^\s+|\s+$", String.Empty); |
- Sums up the length of the strings afterwards for each technique to verify they each have the same number of characters.
The code assumes all conversions will happen with no exception testing because I’m just wanting to test the raw speed of trimming.
The exe file was run on Windows 7 64-bit with 32 GB memory.
Ready! Set! Go!
Before starting, I thought the trim() method would be roughly the same speed as some of the looping mechanisms.
Let’s see what happened on my machine over 3 runs.
All times are indicated in minutes:seconds.milliseconds format. Lower numbers indicate faster performance. Winner marked in green; second runner up in yellow.
Run #1:
# of strings: |
1,000 |
100,000 |
10,000,000 |
50,000,000 |
||||
String lengths: |
28 |
128 |
28 |
128 |
28 |
128 |
28 |
128 |
T0 |
00.0001555 |
00.0002001 |
00.0143148 |
00.0292701 |
02.1042050 |
08.3650818 |
11.3259276 |
01:53.8099364 |
T1 |
00.0001934 |
00.0002352 |
00.0258867 |
00.0419493 |
04.4154810 |
07.8107193 |
22.0753356 |
40.6198890 |
T2 |
00.0002104 |
00.0002262 |
00.0312646 |
00.0580968 |
04.1466723 |
07.2282067 |
39.8048439 |
04:26.4092890 |
T3 |
00.0001172 |
00.0001962 |
00.0111648 |
00.0316191 |
02.3825578 |
07.9296802 |
11.3407739 |
02:05.5858135 |
T4 |
00.0001895 |
00.0002787 |
00.0247331 |
00.0402564 |
03.4295945 |
07.8351153 |
22.3675789 |
01:02.5405533 |
T5 |
00.0001310 |
00.0001737 |
00.0158640 |
00.0237970 |
02.7036543 |
06.1869813 |
20.5532839 |
01:55.8575637 |
T6 |
00.0001748 |
00.0002506 |
00.0294805 |
00.0759495 |
04.0910378 |
07.7567721 |
22.3131481 |
04:38.5943651 |
T7 |
00.0090348 |
00.0115789 |
00.1994419 |
00.4917089 |
20.1211046 |
47.5432612 |
01:39.2615294 |
05:48.0521923 |
T8 |
00.0029984 |
00.0057857 |
00.2975868 |
00.6121313 |
27.6287676 |
59.7483720 |
02:26.8450582 |
06:48.7923743 |
Run #2:
# of strings: |
1,000 |
100,000 |
10,000,000 |
50,000,000 |
||||
String lengths: |
28 |
128 |
28 |
128 |
28 |
128 |
28 |
128 |
T0 |
00.0001271 |
00.0001918 |
00.0153113 |
00.0446730 |
02.2450553 |
09.2303303 |
10.5267592 |
01:54.4426299 |
T1 |
00.0002013 |
00.0002285 |
00.0257134 |
00.0804150 |
04.6304774 |
09.2303303 |
24.6977446 |
45.3262714 |
T2 |
00.0002072 |
00.0002424 |
00.0295452 |
00.0498144 |
04.3764480 |
14.4196840 |
36.4593806 |
04:37.4192367 |
T3 |
00.0001377 |
00.0001385 |
00.0110799 |
00.0230647 |
02.6636935 |
09.9271105 |
25.1969890 |
02:06.3814003 |
T4 |
00.0001689 |
00.0002775 |
00.0240410 |
00.0463813 |
03.5688827 |
07.9968199 |
21.9606207 |
04:17.6683478 |
T5 |
00.0001334 |
00.0001784 |
00.0165806 |
00.0507086 |
02.6118585 |
08.3127418 |
22.9514583 |
02:08.0327605 |
T6 |
00.0001701 |
00.0002862 |
00.0303451 |
00.0499712 |
04.5104454 |
08.2643904 |
22.4619906 |
42.7038166 |
T7 |
00.0088900 |
00.0114253 |
00.2154256 |
00.5227631 |
19.5905518 |
49.8450767 |
01:47.0186781 |
04:33.1056780 |
T8 |
00.0029688 |
00.0057253 |
00.2631291 |
00.5529929 |
29.0790510 |
01:13.5078840 |
02:37.1687600 |
06:55.0569717 |
Run #3:
# of strings: |
1,000 |
100,000 |
10,000,000 |
50,000,000 |
||||
String lengths: |
28 |
128 |
28 |
128 |
28 |
128 |
28 |
128 |
T0 |
00.0001263 |
00.0001717 |
00.0149260 |
00.0279590 |
02.1341783 |
08.6322126 |
11.2097993 |
01:50.0741461 |
T1 |
00.0001985 |
00.0001914 |
00.0261054 |
00.0420918 |
04.4321745 |
08.3776884 |
22.1147168 |
41.6335122 |
T2 |
00.0001784 |
00.0002293 |
00.0292389 |
00.0581134 |
04.1729740 |
08.7895701 |
33.5683055 |
04:27.3077117 |
T3 |
00.0001393 |
00.0001709 |
00.0108304 |
00.0324849 |
02.5714469 |
07.6751860 |
10.2752051 |
01:55.8796821 |
T4 |
00.0001752 |
00.0002783 |
00.0243841 |
00.0408684 |
03.3974558 |
08.1059449 |
21.6428819 |
39.1922973 |
T5 |
00.0001227 |
00.0001598 |
00.0161037 |
00.0245890 |
02.6836524 |
07.1333938 |
21.8612290 |
01:53.4942438 |
T6 |
00.0001693 |
00.0002747 |
00.0284714 |
00.0769491 |
04.1125948 |
07.0372390 |
22.1117527 |
42.6185230 |
T7 |
00.0089215 |
00.0113337 |
00.1968631 |
00.4915241 |
19.8634647 |
46.8572479 |
01:39.2790306 |
04:07.9180472 |
T8 |
00.0029518 |
00.0057138 |
00.2973902 |
00.6133417 |
19.8634647 |
57.5029404 |
02:28.3609601 |
06:46.4557162 |
Well, we at least know what NOT to use!
Looking at the results, one thing is clear – don’t ever user REGEX where speed is a necessity. Otherwise, looking at the green color patterns, there’s no clear-cut winner.
Even if we go by the total number of “wins” and “runner up” for each technique:
T0: 5, 9
T1: 1, 2
T2: 0, 1
T3: 8, 4
T4: 2, 2
T5: 6, 5
T6: 2, 1
T3 won 33% of the time and was in the top two 25% of the time;
T0 only won 21% of the time but was in the top two 29% of the time.
When dealing with 50,000,000 strings at least 128 characters, T2 and T6 were consistently among the fastest beating everything else. And by a significant noticeable amount of time. But only for 50,000,000 strings at least 128 characters in length.
For 50,000,000 strings 28 characters in length, T0 and T3 seemed to perform the best consistently.
Final Say:
On my system, unless someone spots a flaw in my test code, the built in C# .Trim() method should be fine for the general populace. It’s not always the fastest or even second fastest, but the average user probably wouldn’t notice.
If you need to micro-optimize where every millisecond counts, there’s no easy answer other than you’ll have to micro-optimize your code and test the various techniques. Or you could hire me to do it for you. 🙂
Lastly, for any application where you’ll need to trim more than 1,000 strings, don’t use Regex as it’s slower by at least a factor of 10.
The Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
using System; using System.Collections.Generic; using System.Collections; using System.Collections.Concurrent; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; using System.Threading.Tasks; using System.Threading; using System.Diagnostics; namespace TestApplication { class Program { static void Main(string[] args) { DateTime end; DateTime start = DateTime.Now; TestFastestWayToTrimAString(1000, 28, 5); TestFastestWayToTrimAString(1000, 128, 5); TestFastestWayToTrimAString(100000, 28, 5); TestFastestWayToTrimAString(100000, 128, 5); TestFastestWayToTrimAString(10000000, 28, 5); TestFastestWayToTrimAString(10000000, 128, 5); TestFastestWayToTrimAString(50000000, 28, 5); TestFastestWayToTrimAString(50000000, 128, 5); end = DateTime.Now; Console.WriteLine(); Console.WriteLine("### Overall End Time: " + end.ToLongTimeString()); Console.WriteLine("### Overall Run Time: " + (end - start)); Console.WriteLine(); Console.WriteLine("Hit Enter to Exit"); Console.ReadLine(); } //######################################################################################################### //What's the fastest way to Trim() a string? static void TestFastestWayToTrimAString(int NumberOfStrings, int MaxLengthOfStrings, int LengthOfWhiteSpace) { Console.WriteLine(); Console.WriteLine("######## " + System.Reflection.MethodBase.GetCurrentMethod().Name); Console.WriteLine("Number of Random Strings that will be generated: " + NumberOfStrings.ToString("#,##0")); Console.WriteLine("Max Length of Strings that will be generated: " + MaxLengthOfStrings.ToString("#,##0")); Console.WriteLine("Amount of White Space on either side: " + LengthOfWhiteSpace.ToString("#,##0")); Console.WriteLine(); // Use two precompiled Regexes. Regex r1 = new Regex(@"^\s+|\s+$", RegexOptions.Compiled); //Regex r2 = new Regex(@"\s+$", RegexOptions.Compiled); Stopwatch sw = new Stopwatch(); DateTime end = DateTime.Now; DateTime start = DateTime.Now; //the strings to search string[] s = new string[NumberOfStrings]; string[] stringsTrimmed = new string[NumberOfStrings]; //used for while loops int startIndex = 0; int endIndex = 0; long sumOfStringLengths = 0; //sum should be the same in all different techniques used. helps verify string trimming logic is correct. //Generate the string arrays int z = MaxLengthOfStrings; Console.WriteLine("Generating strings #########################################"); for (int x = 0; x < NumberOfStrings; x++) { s[x] = System.Web.Security.Membership.GeneratePassword(z, ((z % 5) == 0 ? 1 : (z % 5))); z -= 1; if (z == 0) z = MaxLengthOfStrings; //generate random white space if (x % 3 == 0) s[x] = s[x].PadLeft(s[x].Length + LengthOfWhiteSpace, ' '); else if (x % 3 == 1) s[x] = s[x].PadRight(s[x].Length + LengthOfWhiteSpace, '\n'); else s[x] = s[x].PadLeft(s[x].Length + LengthOfWhiteSpace, ' ').PadRight(s[x].Length + (LengthOfWhiteSpace * 2), '\t'); } Console.WriteLine("###########################################################"); Console.WriteLine(); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting .Trim() at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { stringsTrimmed[x] = s[x].Trim(); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting .TrimEnd().TrimStart() at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { stringsTrimmed[x] = s[x].TrimEnd().TrimStart(); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting .TrimStart().TrimEnd() at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { stringsTrimmed[x] = s[x].TrimStart().TrimEnd(); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting char.IsWhiteSpace + substring using while loop: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { startIndex = 0; endIndex = s[x].Length - 1; //get the starting point of the string without the whitespace while (char.IsWhiteSpace(s[x][startIndex])) startIndex += 1; //get the end point of the string without the whitespace while (char.IsWhiteSpace(s[x][endIndex])) endIndex -= 1; endIndex += 1; //remove the whitespace stringsTrimmed[x] = s[x].Substring(startIndex, (endIndex - startIndex)); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting char.IsWhiteSpace + .Remove using while loop: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { startIndex = 0; endIndex = s[x].Length - 1; //get the starting point of the string without the whitespace while (char.IsWhiteSpace(s[x][startIndex])) startIndex += 1; //get the end point of the string without the whitespace while (char.IsWhiteSpace(s[x][endIndex])) endIndex -= 1; endIndex += 1; //remove the whitespace, starting with end ones first then those at beginning stringsTrimmed[x] = s[x].Remove(endIndex, (s[x].Length - endIndex)).Remove(0, startIndex); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting char.IsWhiteSpace + substring using for loop: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { startIndex = 0; endIndex = s[x].Length - 1; //get the starting point of the string without the whitespace for (int y = 0; y < s[x].Length; y++) { if (char.IsWhiteSpace(s[x][y])) startIndex += 1; else break; } //get the end point of the string without the whitespace for (int y = (s[x].Length - 1); y > 0; y--) { if (char.IsWhiteSpace(s[x][y])) endIndex -= 1; else break; } endIndex += 1; //remove the whitespace stringsTrimmed[x] = s[x].Substring(startIndex, (endIndex - startIndex)); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting char.IsWhiteSpace + .Remove using for loop: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { startIndex = 0; endIndex = s[x].Length - 1; //get the starting point of the string without the whitespace for (int y = 0; y < s[x].Length; y++) { if (char.IsWhiteSpace(s[x][y])) startIndex += 1; else break; } //get the end point of the string without the whitespace for (int y = (s[x].Length - 1); y > 0; y--) { if (char.IsWhiteSpace(s[x][y])) endIndex -= 1; else break; } endIndex += 1; //remove the whitespace, starting with end ones first then those at beginning stringsTrimmed[x] = s[x].Remove(endIndex, (s[x].Length - endIndex)).Remove(0, startIndex); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting Regex Compiled at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { stringsTrimmed[x] = r1.Replace(s[x], String.Empty); //stringsTrimmed[x] = r2.Replace(stringsTrimmed[x], String.Empty); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); Thread.Sleep(500); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); Console.WriteLine("###########################################################"); Console.WriteLine("Starting Regex NOT Compiled at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("###########################################################"); sumOfStringLengths = 0; sw.Restart(); for (int x = 0; x < s.Length; x++) { stringsTrimmed[x] = Regex.Replace(s[x], @"^\s+|\s+$", String.Empty); sumOfStringLengths += stringsTrimmed[x].Length; } sw.Stop(); Console.WriteLine("Finished at: " + DateTime.Now.ToLongTimeString()); Console.WriteLine("Sum of trimmed string lengths: " + sumOfStringLengths.ToString("#,##0")); Console.WriteLine("Time: " + sw.Elapsed.ToString("mm\\:ss\\.fffffff")); //clean up Array.Clear(s, 0, s.Length); Array.Clear(stringsTrimmed, 0, stringsTrimmed.Length); s = null; stringsTrimmed = null; GC.Collect(); } } //class } //namespace |