C# .Net: Fastest Way to Read Text Files

C# .Net: Fastest Way to Read Text Files

This will examine many techniques to determine in C# .Net: Fastest Way to Read Text Files or the fastest way to read a single text file.

I have seen a lot of questions asked around the internet asking the question, “what’s the fastest way to read a text file”. I’ve had to write numerous applications which did this, but never gave it serious consideration until I had to write an application which was to read text files with several hundred million lines for processing.

 

The Set Up:

I wrote a C# Console application to test 9 different techniques to read a text file. This isn’t an exhaustive list, but I believe covers how it’s done most of the time.

The code is written in Visual Studio 2012 targeting .Net Framework version 4.5 x64. The source code is available at the end so you can benchmark it on your own system if you wish.

In a nutshell, the code does the following:

1)      Generates a GUID

2)      Creates a string object with that GUID repeated either 5, 10, or 25 times

3)      Writes the string object to a local text file 4,294,967 times, 2,147,483 times, or 214,748 times.

4)      It then reads the text file in using 9 techniques, identified below, clearing all the objects and doing a garbage collection after each run to make sure we start each run with fresh resources:

#

Technique

Code Snippet

T1

Reading the entire file into a single string

using (StreamReader sr = File.OpenText(fileName))
{
        string s = sr.ReadToEnd();
        //you then have to process the string
}

T2

Reading the entire file into a single StringBuilder object

using (StreamReader sr = File.OpenText(fileName))
{
        StringBuilder sb = new StringBuilder();
        sb.Append(sr.ReadToEnd());
        //you then have to process the string
}

T3

Reading each line into a string

using (StreamReader sr = File.OpenText(fileName))
{
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
               //we're just testing read speeds
        }
}

T4

Reading each line into a string using a BufferedReader

using (FileStream fs = File.Open(fileName, ..... ))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
        string s;
        while ((s = sr.ReadLine()) != null)
        {
               //we're just testing read speeds
        }
}

T5

Reading each line into a string using a BufferedReader with a preset buffer size equal to the size of the biggest line

using (FileStream fs = File.Open(fileName, ..... ))
using (BufferedStream bs = new BufferedStream(fs, 
    System.Text.ASCIIEncoding.Unicode.GetByteCount(g)))
using (StreamReader sr = new StreamReader(bs))
{
        string s;
        while ((s = sr.ReadLine()) != null)
        {
               //we're just testing read speeds
        }
}

T6

Reading each line into a StringBuilder object.

using (StreamReader sr = File.OpenText(fileName))
{
        StringBuilder sb = new StringBuilder();
        while (sb.Append(sr.ReadLine()).Length > 0)
        {
               //we're just testing read speeds
               sb.Clear();
        }
}

T7

Reading each line into a StringBuilder object with its size preset and equal to the size of the biggest line

using (StreamReader sr = File.OpenText(fileName))
{
        StringBuilder sb = 
            new StringBuilder(g.Length);
        while (sb.Append(sr.ReadLine()).Length > 0)
        {
               //we're just testing read speeds
               sb.Clear();
        }
}

T8

Reading each line into a pre-allocated string array object.

AllLines = new string[MAX];
using (StreamReader sr = File.OpenText(fileName))
{
        int x = 0;
        while (!sr.EndOfStream)
        {
               //we're just testing read speeds
               AllLines[x] = sr.ReadLine();
               x += 1;
        }
}

T9

Reading the entire file into a string array object using the .Net ReadAllLines() method.

AllLines = new string[MAX];
AllLines = File.ReadAllLines(fileName);

5)      The generated file is then deleted.

The exe file was installed and run on an Alienware M17X R3 on a single purely 7200 rpm mechanical drive as I didn’t want the effects the memory of a “hybrid” drive or mSata card might have on the system to taint the results. The Alienware is running Windows 7 64-bit with 16 GB memory on an i7-2820QM processor. This trial was run over the course of three days, once on each day, waiting 5 minutes after the machine was up and running from a cold start up. This was to eliminate any other background processes starting up with might detract from the test.

 

So what happens? Give us the scoop already!

Before starting, my hypothesis was that I expected reading each line into the same StringBuilder object to excel since no time would be spent constantly creating new string objects (since they’re immutable, a new one has to be created and reassigned with each read).

All times are indicated in seconds. The lower the number, the faster the technique performed.

Green cells indicate the winner(s) for that run.

Yellow cells indicate the second runner(s) up.

Columns with a “-“ character indicate the test couldn’t be performed because an “out of memory exception” was thrown. For example, apparently 16GB isn’t enough memory to read a 4,294,967 line text file with 25 Guids per line into a single string.

Run #1

5 Guids per line

10 Guids per line

25 Guids per line

 

Lines per file:

Lines per file:

Lines per file:

 

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

T1: into single string

2.7456

1.5756

0.2652

-

2.8392

0.3120

-

-

0.7332

T2: into single StringBuilder

3.4476

1.9032

0.1872

-

3.6504

0.4368

-

-

0.9360

T3: each line into a string

2.7768

1.3416

0.1560

5.4912

2.7144

0.2964

13.9620

6.9576

0.6552

T4: T3 using BufferedReader

2.6676

1.3728

0.1716

5.2728

2.5896

0.2808

13.8060

6.9108

0.7020

T5: T4 w/ preset buffer size

2.7612

1.3884

0.1560

5.0076

2.5116

0.2964

14.0244

6.9264

0.7176

T6: each line into StringBuilder

2.9328

1.4508

0.1716

5.5848

2.7924

0.3432

14.0712

7.2696

0.7020

T7: T6 w/ preset size

2.7144

1.4352

0.1716

5.5692

2.7768

0.3120

14.1180

7.4412

0.6708

T8: into preallocated string[]

5.9748

2.8704

0.2652

13.6968

5.1792

0.4680

57.3301

15.9588

1.0608

T9: File.ReadAllLines()

5.7720

2.6832

0.3276

13.1352

5.0076

0.4836

71.9785

15.6936

1.1388

Run #2

5 Guids per line

10 Guids per line

25 Guids per line

 

Lines per file:

Lines per file:

Lines per file:

 

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

T1: into single string

2.8704

1.5444

0.1716

-

3.1200

0.2964

-

-

0.7332

T2: into single StringBuilder

3.4320

1.9656

0.2028

-

3.5568

0.4212

-

-

0.9204

T3: each line into a string

2.7612

1.3728

0.1560

5.4132

2.7768

0.2808

13.9776

7.1292

0.6864

T4: T3 using BufferedReader

2.7300

1.4040

0.1560

5.4444

2.8392

0.2964

14.0400

7.0668

0.8580

T5: T4 w/ preset buffer size

2.7144

1.4040

0.1560

5.3820

2.7768

0.2964

14.8356

7.4880

0.7176

T6: each line into StringBuilder

2.8548

1.5600

0.1716

5.5380

2.7924

0.2964

14.2272

7.1760

0.7488

T7: T6 w/ preset size

2.6832

1.4664

0.1872

5.5692

2.8548

0.2964

14.2740

7.2072

0.7020

T8: into preallocated string[]

6.1776

2.8860

0.2808

15.0228

5.7252

0.4680

58.8745

15.9588

1.2012

T9: File.ReadAllLines()

5.9904

2.7456

0.3120

11.5284

5.2884

0.4992

70.9021

16.3332

1.0608

Run #3

5 Guids per line

10 Guids per line

25 Guids per line

 

Lines per file:

Lines per file:

Lines per file:

 

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

4,294,967

2,147,483

214,748

T1: into single string

2.8548

1.4664

0.1404

-

2.8860

0.3120

-

-

0.7332

T2: into single StringBuilder

3.2136

1.8252

0.2340

-

3.3384

0.4056

-

-

0.9048

T3: each line into a string

2.6208

1.3572

0.1716

5.1636

2.5584

0.2964

13.1820

6.6768

0.6708

T4: T3 using BufferedReader

2.6364

1.2948

0.1248

5.2416

2.5896

0.3744

13.1196

6.6456

0.6864

T5: T4 w/ preset buffer size

2.6520

1.3104

0.1248

5.2260

2.5896

0.2964

14.1648

7.2384

0.7644

T6: each line into StringBuilder

2.8236

1.4820

0.1560

5.3508

2.6988

0.3120

13.4160

6.8484

0.7020

T7: T6 w/ preset size

2.7768

1.3884

0.1716

5.3196

2.6832

0.3120

13.4160

6.8172

0.7176

T8: into preallocated string[]

5.9748

2.7924

0.2652

13.8216

4.7580

0.4680

57.7513

15.5688

1.1388

T9: File.ReadAllLines()

5.7564

2.6676

0.2808

16.0368

5.0076

0.4863

70.1065

15.5376

1.0608

 

The Results:

Seeing the results, there is no clear-cut winner between techniques T3, T4, T5, T6, & T7. I have read lots of postings across the internet, especially on StackOverflow.com, with people advocating that using a buffered reader is faster. That’s not always the case according to my results. Even when it is, the difference in time is so negligible one has to ask, “Is it worth it?”

There were several more surprises for me:

1)      reading each line into a string, buffered or unbuffered, always topped the list. I was expecting reading into a StringBuilder to dominate.

2)      reading the entire file into a single string or StringBuilder object didn’t perform well at all relatively speaking.

3)      The built in .Net File.ReadAllLines() method performed practically on par or better with reading each line into a pre-allocated string array. I’m surprised by this because I thought that allocating and resizing the array as it goes along would have been costly to the underlying system. The difference in performance when reading 4,294,967 lines with 25 guids per line is the result I would have expected everywhere between the two.

So what technique should you use?

On my system, unless someone spots a flaw in my test code, it really makes no significant performance difference whether you use a regular reader or buffered reader. Plus, now you have code and supporting evidence to dispute someone saying a buffered reader is always faster.

Obviously you should test on your system before micro-optimizing this functionality for your .Net application.

Even though the two techniques of reading the entire file contents into an array were the slowest, in the end these could be the best way for your application if you have a lot of processing to do for each line. For example, if you can work each line independently via parallel processing such as with a Parallel.For or Parallel.ForEach loop because one line’s value isn’t dependent on the next.

Read my blog post, Using C# .Net: Fastest Way to Read and Process Text Files to see just how big of a difference implementing a Parallel.For loop can make!

The results will astound. See them at http://cc.davelozinski.com/c-sharp/the-fastest-way-to-read-and-process-text-files

 

The Code:

using System;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading;

namespace TestApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            DateTime end;
            DateTime start = DateTime.Now;

            Console.WriteLine("### Overall Start Time: " + start.ToLongTimeString());
            Console.WriteLine();

            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 500)), 5);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 500)), 10);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 500)), 25);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 1000)), 5);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 1000)), 10);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 1000)), 25);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 5);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 10);
            TestReadingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 25);

            end = DateTime.Now;
            Console.WriteLine();
            Console.WriteLine("### Overall End Time: " + end.ToLongTimeString());
            Console.WriteLine("### Overall Run Time: " + (end - start));

            Console.WriteLine();
            Console.WriteLine("Hit Enter to Exit");
            Console.ReadLine();

        }

        //####################################################

        //Does a comparison of reading all the lines in from a file. Which way is fastest?
        static void TestReadingLinesFromFile(int numberOfLines, int numTimesGuidRepeated)
        {
            Console.WriteLine("######## " + System.Reflection.MethodBase.GetCurrentMethod().Name);
            Console.WriteLine("######## Number of lines in file: " + numberOfLines);
            Console.WriteLine("######## Number of times Guid repeated on each line: " + numTimesGuidRepeated);
            Console.WriteLine("###########################################################");
            Console.WriteLine();
            string g = String.Join("", Enumerable.Repeat(new Guid().ToString(), numTimesGuidRepeated));
            string[] AllLines = null;
            string fileName = "Performance_Test_File.txt";
            int MAX = numberOfLines;
            DateTime end;
            DateTime start = DateTime.Now;

            //Create the file populating it with GUIDs
            Console.WriteLine("Generating file: " + start.ToLongTimeString());
            using (StreamWriter sw = File.CreateText(fileName))
            {
                for (int x = 0; x < MAX; x++)
                {
                    sw.WriteLine(g);
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Just read everything into one string
            Console.WriteLine("Reading file reading to end into string: ");
            start = DateTime.Now;
            try
            {
                using (StreamReader sr = File.OpenText(fileName))
                {
                    string s = sr.ReadToEnd();
                    //Obviously you'd then have to process the string
                }
                end = DateTime.Now;
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Read the entire contents into a StringBuilder object
            Console.WriteLine("Reading file reading to end into stringbuilder: ");
            start = DateTime.Now;
            try
            {
                using (StreamReader sr = File.OpenText(fileName))
                {
                    StringBuilder sb = new StringBuilder();
                    sb.Append(sr.ReadToEnd());
                    //Obviously you'd then have to process the string
                }
                end = DateTime.Now;
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Standard and probably most common way of reading a file. 
            Console.WriteLine("Reading file assigning each line to string: ");
            start = DateTime.Now;
            using (StreamReader sr = File.OpenText(fileName))
            {
                string s = String.Empty;
                while ((s = sr.ReadLine()) != null)
                {
                    //we're just testing read speeds
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Doing it the most common way, but using a Buffered Reader now.
            Console.WriteLine("Buffered reading file assigning each line to string: ");
            start = DateTime.Now;
            using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bs = new BufferedStream(fs))
            using (StreamReader sr = new StreamReader(bs))
            {
                string s;
                while ((s = sr.ReadLine()) != null)
                {
                    //we're just testing read speeds
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Reading each line using a buffered reader again, but setting the buffer size since we know what it will be.
            Console.WriteLine("Buffered reading with preset buffer size assigning each line to string: ");
            start = DateTime.Now;
            using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bs = new BufferedStream(fs, System.Text.ASCIIEncoding.Unicode.GetByteCount(g)))
            using (StreamReader sr = new StreamReader(bs))
            {
                string s;
                while ((s = sr.ReadLine()) != null)
                {
                    //we're just testing read speeds
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Read every line of the file reusing a StringBuilder object to save on string memory allocation times
            Console.WriteLine("Reading file assigning each line to StringBuilder: ");
            start = DateTime.Now;
            using (StreamReader sr = File.OpenText(fileName))
            {
                StringBuilder sb = new StringBuilder();
                while (sb.Append(sr.ReadLine()).Length > 0)
                {
                    //we're just testing read speeds
                    sb.Clear();
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Reading each line into a StringBuilder, but setting the StringBuilder object to an initial
            //size since we know how long the longest line in the file is.
            Console.WriteLine("Reading file assigning each line to preset size StringBuilder: ");
            start = DateTime.Now;
            using (StreamReader sr = File.OpenText(fileName))
            {
                StringBuilder sb = new StringBuilder(g.Length);
                while (sb.Append(sr.ReadLine()).Length > 0)
                {
                    //we're just testing read speeds
                    sb.Clear();
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();

            Thread.Sleep(1000);     //give disk hardware time to recover

            //Read each line into an array index. 
            Console.WriteLine("Reading each line into string array: ");
            start = DateTime.Now;
            try
            {
                AllLines = new string[MAX];    //only allocate memory here
                using (StreamReader sr = File.OpenText(fileName))
                {
                    int x = 0;
                    while (!sr.EndOfStream)
                    {
                        //we're just testing read speeds
                        AllLines[x] = sr.ReadLine();
                        x += 1;
                    }
                }
                end = DateTime.Now;

                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            finally
            {
                if (AllLines != null)
                {
                    Array.Clear(AllLines, 0, AllLines.Length);
                    AllLines = null;
                }
            }

            GC.Collect();

            Thread.Sleep(1000);

            //Read the entire file using File.ReadAllLines. 
            Console.WriteLine("Performing File ReadAllLines into array: ");
            start = DateTime.Now;
            try
            {
                AllLines = new string[MAX];    //only allocate memory here
                AllLines = File.ReadAllLines(fileName);
                end = DateTime.Now;

                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            finally
            {
                if (AllLines != null)
                {
                    Array.Clear(AllLines, 0, AllLines.Length);
                    AllLines = null;
                }
            }

            File.Delete(fileName);
            fileName = null;

            GC.Collect();
        }
    }

}
  • Parthiban

    Great Work Buddy!!!

  • prasun9

    I would like to share my findings, which differed based on the file location (local or remote) and file type (Text or Binary).
    1. If the file is located on a remote drive then it is much better to read the file at once and then parse the MemoryStream one line at a time rather than using FileStream, BufferedStream or preset buffer size.
    2. If the file is a Binary file then File.ReadAllBytes() is much much faster (3-4 times) than File.ReadAllText() or File.ReadAllLines()

  • Chris Stiefeling

    We’ve seen huge differences when the files reside on network shares vs a local drive. It would be interesting to see the results from a network perspective.

  • http://www.nenutech.com abid ahmed

    i am java developer i have a question
    will this work on earlier versions of visual studio?

    • David Lozinski

      Obviously depends on how “early” you mean by “earlier versions”. I think all these techniques should work from VS 2005 upwards. It’s not so much dependent upon the version of Visual Studio you have. Rather, the version of .Net installed (since some versions of Visual Studio allow you to “target” a specific .Net runtime version).

  • Moeslund

    Thank you for this demonstration. Interesting reading and very useful. I have now changed file handling in my latest project to use T4 instead of T9.

  • http://www.leandroribeiro.com Leandro Ribeiro

    Good article. I’m also surprised at the results.

    • David Lozinski

      Thank you. Yeah, no more making sure to use buffered readers for me. I’d be interested what results other people get on their machines when they run the code as well. I’ve just about finished up the article on the fastest way to process a file once it’s read… stay tuned as it will be posted shortly. :-)