21

C# .Net: Fastest Way to Read Text Files

Spread the love

C# .Net: Fastest Way to Read Text Files

This will examine many techniques to determine in C# .Net: Fastest Way to Read Text Files or the fastest way to read a single text file.

I have seen a lot of questions asked around the internet asking the question, “what’s the fastest way to read a text file”. I’ve had to write numerous applications which did this, but never gave it serious consideration until I had to write an application which was to read text files with several hundred million lines for processing.

 

The Set Up:

I wrote a C# Console application to test 9 different techniques to read a text file. This isn’t an exhaustive list, but I believe covers how it’s done most of the time.

The code is written in Visual Studio 2012 targeting .Net Framework version 4.5 x64. The source code is available at the end so you can benchmark it on your own system if you wish.

In a nutshell, the code does the following:

1) Generates a GUID

2) Creates a string object with that GUID repeated either 5, 10, or 25 times

3) Writes the string object to a local text file 4,294,967 times, 2,147,483 times, or 214,748 times.

4) It then reads the text file in using 9 techniques, identified below, clearing all the objects and doing a garbage collection after each run to make sure we start each run with fresh resources:

#

Technique

Code Snippet

T1

Reading the entire file into a single string using a StreamReader ReadToEnd() method

T2

Reading the entire file into a single StringBuilder object using a StreamReader ReadToEnd() method

T3

Reading each line into a string using StreamReader ReadLine() method

T4

Reading each line into a string using a BufferedStream

T5

Reading each line into a string using a BufferedStream with a preset buffer size equal to the size of the biggest line

T6

Reading each line into a StringBuilder object using StreamReader ReadLine() method.

T7

Reading each line into a
StringBuilder
object with its size preset and equal to the size of the biggest line

T8

Reading each line into a pre-allocated string array object.

T9

Reading the entire file into a string array object using the .Net ReadAllLines() method.

5) The generated file is then deleted.

The exe file was on Windows 7 64-bit with a single purely 7200 rpm mechanical drive as I didn’t want the effects the memory of a “hybrid” drive or mSata card might have on the system to taint the results. This trial was run over the course of three days, once on each day, waiting 5 minutes after the machine was up and running from a cold start up. This was to eliminate any other background processes starting up with might detract from the test.

 

So what happens? Give us the scoop already!

Before starting, my hypothesis was that I expected reading each line into the same StringBuilder object to excel since no time would be spent constantly creating new string objects (since they’re immutable, a new one has to be created and reassigned with each read).

All times are indicated in seconds. The lower the number, the faster the technique performed.

Columns with a “-” character indicate the test couldn’t be performed because an “out of memory exception” was thrown. For example, apparently 16GB isn’t enough memory to read a 4,294,967 line text file with 25 Guids per line into a single string.

Run #1

5 Guids per line

10 Guids per line

25 Guids per line

 

Lines per file:

Lines per file:

Lines per file:

 

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

T1: into single string

2.7456

1.5756

0.2652

2.8392

0.3120

0.7332

T2: into single StringBuilder

3.4476

1.9032

0.1872

3.6504

0.4368

0.9360

T3: each line into a string

2.7768

1.3416

0.1560

5.4912

2.7144

0.2964

13.9620

6.9576

0.6552

T4: T3 using BufferedReader

2.6676

1.3728

0.1716

5.2728

2.5896

0.2808

13.8060

6.9108

0.7020

T5: T4 w/ preset buffer size

2.7612

1.3884

0.1560

5.0076

2.5116

0.2964

14.0244

6.9264

0.7176

T6: each line into StringBuilder

2.9328

1.4508

0.1716

5.5848

2.7924

0.3432

14.0712

7.2696

0.7020

T7: T6 w/ preset size

2.7144

1.4352

0.1716

5.5692

2.7768

0.3120

14.1180

7.4412

0.6708

T8: into preallocated string[]

5.9748

2.8704

0.2652

13.6968

5.1792

0.4680

57.3301

15.9588

1.0608

T9: File.ReadAllLines()

5.7720

2.6832

0.3276

13.1352

5.0076

0.4836

71.9785

15.6936

1.1388

Run #2

5 Guids per line

10 Guids per line

25 Guids per line

 

Lines per file:

Lines per file:

Lines per file:

 

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

T1: into single string

2.8704

1.5444

0.1716

3.1200

0.2964

0.7332

T2: into single StringBuilder

3.4320

1.9656

0.2028

3.5568

0.4212

0.9204

T3: each line into a string

2.7612

1.3728

0.1560

5.4132

2.7768

0.2808

13.9776

7.1292

0.6864

T4: T3 using BufferedReader

2.7300

1.4040

0.1560

5.4444

2.8392

0.2964

14.0400

7.0668

0.8580

T5: T4 w/ preset buffer size

2.7144

1.4040

0.1560

5.3820

2.7768

0.2964

14.8356

7.4880

0.7176

T6: each line into StringBuilder

2.8548

1.5600

0.1716

5.5380

2.7924

0.2964

14.2272

7.1760

0.7488

T7: T6 w/ preset size

2.6832

1.4664

0.1872

5.5692

2.8548

0.2964

14.2740

7.2072

0.7020

T8: into preallocated string[]

6.1776

2.8860

0.2808

15.0228

5.7252

0.4680

58.8745

15.9588

1.2012

T9: File.ReadAllLines()

5.9904

2.7456

0.3120

11.5284

5.2884

0.4992

70.9021

16.3332

1.0608

Run #3

5 Guids per line

10 Guids per line

25 Guids per line

 

Lines per file:

Lines per file:

Lines per file:

 

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

T1: into single string

2.8548

1.4664

0.1404

2.8860

0.3120

0.7332

T2: into single StringBuilder

3.2136

1.8252

0.2340

3.3384

0.4056

0.9048

T3: each line into a string

2.6208

1.3572

0.1716

5.1636

2.5584

0.2964

13.1820

6.6768

0.6708

T4: T3 using BufferedReader

2.6364

1.2948

0.1248

5.2416

2.5896

0.3744

13.1196

6.6456

0.6864

T5: T4 w/ preset buffer size

2.6520

1.3104

0.1248

5.2260

2.5896

0.2964

14.1648

7.2384

0.7644

T6: each line into StringBuilder

2.8236

1.4820

0.1560

5.3508

2.6988

0.3120

13.4160

6.8484

0.7020

T7: T6 w/ preset size

2.7768

1.3884

0.1716

5.3196

2.6832

0.3120

13.4160

6.8172

0.7176

T8: into preallocated string[]

5.9748

2.7924

0.2652

13.8216

4.7580

0.4680

57.7513

15.5688

1.1388

T9: File.ReadAllLines()

5.7564

2.6676

0.2808

16.0368

5.0076

0.4863

70.1065

15.5376

1.0608

 

The Results:

Seeing the results, there is no clear-cut winner between techniques T3, T4, T5, T6, & T7. I have read lots of postings across the internet, especially on StackOverflow.com, with people advocating that using a buffered reader is faster. That’s not always the case according to my results. Even when it is, the difference in time is so negligible one has to ask, “Is it worth it?”

There were several more surprises for me:

1) reading each line into a string, buffered or unbuffered, always topped the list. I was expecting reading into a StringBuilder to dominate.

2) reading the entire file into a single string or StringBuilder object didn’t perform well at all relatively speaking.

3) The built in .Net File.ReadAllLines() method performed practically on par or better with reading each line into a pre-allocated string array. I’m surprised by this because I thought that allocating and resizing the array as it goes along would have been costly to the underlying system. The difference in performance when reading 4,294,967 lines with 25 guids per line is the result I would have expected everywhere between the two.

 

So what technique should you use?

On my system, unless someone spots a flaw in my test code, it really makes no significant performance difference whether you use a regular reader or buffered reader. Plus, now you have code and supporting evidence to dispute someone saying a buffered reader is always faster.

Obviously you should test on your system before micro-optimizing this functionality for your .Net application.

Even though the two techniques of reading the entire file contents into an array were the slowest, in the end these could be the best way for your application if you have a lot of processing to do for each line. For example, if you can work each line independently via parallel processing such as with a Parallel.For or Parallel.ForEach loop because one line’s value isn’t dependent on the next.

Read my blog post, Using C# .Net: Fastest Way to Read and Process Text Files to see just how big of a difference implementing a Parallel.For loop can make!

The results will astound. See them at http://cc.davelozinski.com/c-sharp/the-fastest-way-to-read-and-process-text-files

 

Bonus Link!

For all the readers who requested it, here’s C# code to serve as a starting point for you to do your own reading lines from a text file in batches and processing in parallel! Enjoy!

 

The Code:

 


Spread the love

David Lozinski