2

C#: Fastest way to TRIM strings

C#: Fastest Way to TRIM strings

This will benchmark various techniques to determine in C# .Net: Fastest way to TRIM strings.

Removing all the whitespace at the beginning and/or end of a string. Every programmer at some point needs to do this. The majority, if not all, programmers typically use a built in library or method call. Why not? Simple. Elegant.

But that’s when this Curious Consultant started wondering in C#: what is the fastest way to trim strings?

The Set Up:

I wrote a C# Console application to test multiple techniques.

The code is written in Visual Studio 2013 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system if you wish.

In a nutshell, the code does the following:

  1. Creates a specified number of strings (1000, 100000, 10000000, 50000000), both 28 & 128 characters long, with a random amount of white space on either end.
  2. Uses one of the techniques below to remove the white space from both the beginning and end of the strings:

#

Technique

Code Snippet

T0

.Trim()

stringsTrimmed[x] = s[x].Trim();

T1

.TrimEnd().TrimStart()

stringsTrimmed[x] = s[x].TrimEnd().TrimStart();

T2

.TrimStart().TrimEnd()

stringsTrimmed[x] = s[x].TrimStart().TrimEnd();

T3

Char.IsWhiteSpace + Substring() using while loop

T4

Char.IsWhiteSpace + Remove() using while loop

T5

Char.IsWhiteSpace + Substring using for loop

T6

Char.IsWhiteSpace + Remove() using for loop

T7

Regex Compiled

T8

Regex Not Compiled

stringsTrimmed[x] = Regex.Replace(s[x], @"^\s+|\s+$", String.Empty);

  1. Sums up the length of the strings afterwards for each technique to verify they each have the same number of characters.

The code assumes all conversions will happen with no exception testing because I’m just wanting to test the raw speed of trimming.

The exe file was installed and run on an Alienware M17X R3 running Windows 7 64-bit with 32 GB memory on an i7-2820QM processor.

Ready! Set! Go!

Before starting, I thought the trim() method would be roughly the same speed as some of the looping mechanisms.

Let’s see what happened on my machine over 3 runs.

All times are indicated in minutes:seconds.milliseconds format. Lower numbers indicate faster performance. Winner marked in green; second runner up in yellow.

Run #1:

# of strings:

1,000

100,000

10,000,000

50,000,000

String lengths:

28

128

28

128

28

128

28

128

T0

00.0001555

00.0002001

00.0143148

00.0292701

02.1042050

08.3650818

11.3259276

01:53.8099364

T1

00.0001934

00.0002352

00.0258867

00.0419493

04.4154810

07.8107193

22.0753356

40.6198890

T2

00.0002104

00.0002262

00.0312646

00.0580968

04.1466723

07.2282067

39.8048439

04:26.4092890

T3

00.0001172

00.0001962

00.0111648

00.0316191

02.3825578

07.9296802

11.3407739

02:05.5858135

T4

00.0001895

00.0002787

00.0247331

00.0402564

03.4295945

07.8351153

22.3675789

01:02.5405533

T5

00.0001310

00.0001737

00.0158640

00.0237970

02.7036543

06.1869813

20.5532839

01:55.8575637

T6

00.0001748

00.0002506

00.0294805

00.0759495

04.0910378

07.7567721

22.3131481

04:38.5943651

T7

00.0090348

00.0115789

00.1994419

00.4917089

20.1211046

47.5432612

01:39.2615294

05:48.0521923

T8

00.0029984

00.0057857

00.2975868

00.6121313

27.6287676

59.7483720

02:26.8450582

06:48.7923743

 

Run #2:

# of strings:

1,000

100,000

10,000,000

50,000,000

String lengths:

28

128

28

128

28

128

28

128

T0

00.0001271

00.0001918

00.0153113

00.0446730

02.2450553

09.2303303

10.5267592

01:54.4426299

T1

00.0002013

00.0002285

00.0257134

00.0804150

04.6304774

09.2303303

24.6977446

45.3262714

T2

00.0002072

00.0002424

00.0295452

00.0498144

04.3764480

14.4196840

36.4593806

04:37.4192367

T3

00.0001377

00.0001385

00.0110799

00.0230647

02.6636935

09.9271105

25.1969890

02:06.3814003

T4

00.0001689

00.0002775

00.0240410

00.0463813

03.5688827

07.9968199

21.9606207

04:17.6683478

T5

00.0001334

00.0001784

00.0165806

00.0507086

02.6118585

08.3127418

22.9514583

02:08.0327605

T6

00.0001701

00.0002862

00.0303451

00.0499712

04.5104454

08.2643904

22.4619906

42.7038166

T7

00.0088900

00.0114253

00.2154256

00.5227631

19.5905518

49.8450767

01:47.0186781

04:33.1056780

T8

00.0029688

00.0057253

00.2631291

00.5529929

29.0790510

01:13.5078840

02:37.1687600

06:55.0569717

 

Run #3:

# of strings:

1,000

100,000

10,000,000

50,000,000

String lengths:

28

128

28

128

28

128

28

128

T0

00.0001263

00.0001717

00.0149260

00.0279590

02.1341783

08.6322126

11.2097993

01:50.0741461

T1

00.0001985

00.0001914

00.0261054

00.0420918

04.4321745

08.3776884

22.1147168

41.6335122

T2

00.0001784

00.0002293

00.0292389

00.0581134

04.1729740

08.7895701

33.5683055

04:27.3077117

T3

00.0001393

00.0001709

00.0108304

00.0324849

02.5714469

07.6751860

10.2752051

01:55.8796821

T4

00.0001752

00.0002783

00.0243841

00.0408684

03.3974558

08.1059449

21.6428819

39.1922973

T5

00.0001227

00.0001598

00.0161037

00.0245890

02.6836524

07.1333938

21.8612290

01:53.4942438

T6

00.0001693

00.0002747

00.0284714

00.0769491

04.1125948

07.0372390

22.1117527

42.6185230

T7

00.0089215

00.0113337

00.1968631

00.4915241

19.8634647

46.8572479

01:39.2790306

04:07.9180472

T8

00.0029518

00.0057138

00.2973902

00.6133417

19.8634647

57.5029404

02:28.3609601

06:46.4557162

Well, we at least know what NOT to use!

Looking at the results, one thing is clear – don’t ever user REGEX where speed is a necessity. Otherwise, looking at the green color patterns, there’s no clear-cut winner.

Even if we go by the total number of “wins” and “runner up” for each technique:
T0: 5, 9
T1: 1, 2
T2: 0, 1
T3: 8, 4
T4: 2, 2
T5: 6, 5
T6: 2, 1

T3 won 33% of the time and was in the top two 25% of the time;
T0 only won 21% of the time but was in the top two 29% of the time.

When dealing with 50,000,000 strings at least 128 characters, T2 and T6 were consistently among the fastest beating everything else. And by a significant noticeable amount of time. But only for 50,000,000 strings at least 128 characters in length.

For 50,000,000 strings 28 characters in length, T0 and T3 seemed to perform the best consistently.

Final Say:

On my system, unless someone spots a flaw in my test code, the built in C# .Trim() method should be fine for the general populace. It’s not always the fastest or even second fastest, but the average user probably wouldn’t notice.

If you need to micro-optimize where every millisecond counts, there’s no easy answer other than you’ll have to micro-optimize your code and test the various techniques. Or you could hire me to do it for you. 🙂

Lastly, for any application where you’ll need to trim more than 1,000 strings, don’t use Regex as it’s slower by at least a factor of 10.

The Code:

  • NIce article. Thanks.

  • Anders Hybertz

    Yet another nice post. One thing though you might consider in the future could be concurrency performance. Some system API’s might behave really different under load in a multithreaded environment. Often there are some unforeseen lock contentions.