The Fastest Way to Read and Process Text Files using C# .Net

Spread the love

Using C# .Net: Fastest Way to Read and Process Text Files

This will benchmark many techniques to determine in C# .Net: Fastest Way to Read and Process Text Files.

Building upon my previous article what’s the fastest way to read a text file (http://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files), some applications require extensive processing of each line of data from the file. So we need to test more than raw reading speeds – let’s test the various reading techniques while including some mathematical number crunching from each line.

Settings Things Up:

I wrote a C# Console application to test many different techniques to read a text file and process the lines contained therein. This isn’t an exhaustive list, but I believe covers how it’s done most of the time.

The code is written in Visual Studio 2012 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system if you wish.

In a nutshell, the code does the following:

Generates a GUID
Creates a string object with that GUID repeated either 5, 10, or 25 times
Writes the string object to a local text file 429,496 or 214,748 times.
It then reads the text file in using various techniques, identified below, clearing all the objects and doing a garbage collection after each run to make sure we start each run with fresh resources:

Technique

Code Snippet

Reading the entire file into a single string using the StreamReader ReadToEnd() method, then process the entire string.

using (StreamReader sr = File.OpenText(fileName))
{
        string s = sr.ReadToEnd();
        TestReadingAndProcessingLinesFromFile_DoStuff(s);
}

using (StreamReader sr = File.OpenText(fileName))

{

string s = sr.ReadToEnd();

TestReadingAndProcessingLinesFromFile_DoStuff(s);

}

Reading the entire file into a single StringBuilder object using the ReadToEnd() method, then process the entire string.

using (StreamReader sr = File.OpenText(fileName))
{
        StringBuilder sb = new StringBuilder();
        sb.Append(sr.ReadToEnd());
        TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString());
}

using (StreamReader sr = File.OpenText(fileName))

{

StringBuilder sb = new StringBuilder();

sb.Append(sr.ReadToEnd());

TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString());

}

Reading each line into a string, and process line by line.

using (StreamReader sr = File.OpenText(fileName))
{
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
               TestReadingAndProcessingLinesFromFile_DoStuff(s);
        }
}

using (StreamReader sr = File.OpenText(fileName))

{

string s = String.Empty;

while ((s = sr.ReadLine()) != null)

{

TestReadingAndProcessingLinesFromFile_DoStuff(s);

}

Reading each line into a string using a BufferedStream, and process line by line.

using (FileStream fs = File.Open(fileName, ..... ))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
        string s;
        while ((s = sr.ReadLine()) != null)
        {
               TestReadingAndProcessingLinesFromFile_DoStuff(s);
        }
}

using (FileStream fs = File.Open(fileName, ..... ))

using (BufferedStream bs = new BufferedStream(fs))

using (StreamReader sr = new StreamReader(bs))

{

string s;

while ((s = sr.ReadLine()) != null)

{

TestReadingAndProcessingLinesFromFile_DoStuff(s);

}

Reading each line into a string using a BufferedStream with a preset buffer size equal to the size of the biggest line, and process line by line.

using (FileStream fs = File.Open(fileName, ..... ))
using (BufferedStream bs 
	= new BufferedStream(fs, System.Text.ASCIIEncoding.Unicode.GetByteCount(g)))
using (StreamReader sr = new StreamReader(bs))
{
        string s;
        while ((s = sr.ReadLine()) != null)
        {
               TestReadingAndProcessingLinesFromFile_DoStuff(s);
        }
}

using (FileStream fs = File.Open(fileName, ..... ))

using (BufferedStream bs

= new BufferedStream(fs, System.Text.ASCIIEncoding.Unicode.GetByteCount(g)))

using (StreamReader sr = new StreamReader(bs))

{

string s;

while ((s = sr.ReadLine()) != null)

{

TestReadingAndProcessingLinesFromFile_DoStuff(s);

}

Reading each line into a StringBuilder object, and process line by line.

using (StreamReader sr = File.OpenText(fileName))
{
        StringBuilder sb = new StringBuilder();
        while (sb.Append(sr.ReadLine()).Length > 0)
        {
                TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString());
               sb.Clear();
        }
}

using (StreamReader sr = File.OpenText(fileName))

{

StringBuilder sb = new StringBuilder();

while (sb.Append(sr.ReadLine()).Length > 0)

{

TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString());

sb.Clear();

}

Reading each line into a StringBuilder object with its size preset and equal to the size of the biggest line, and process line by line.

using (StreamReader sr = File.OpenText(fileName))
{
        StringBuilder sb = new StringBuilder(g.Length);
        while (sb.Append(sr.ReadLine()).Length > 0)
        {
                TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString());
               sb.Clear();
        }
}

using (StreamReader sr = File.OpenText(fileName))

{

StringBuilder sb = new StringBuilder(g.Length);

while (sb.Append(sr.ReadLine()).Length > 0)

{

TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString());

sb.Clear();

}

Reading each line into a pre-allocated string array object, then run a Parallel.For loop to process all the lines in parallel.

AllLines = new string[MAX]; //only allocate memory here
using (StreamReader sr = File.OpenText(fileName))
{
        int x = 0;
        while (!sr.EndOfStream)
        {
               AllLines[x] = sr.ReadLine();
               x += 1;
        }
} //CLOSE THE FILE because we are now DONE with it.
Parallel.For(0, AllLines.Length, x =>
{
    TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]);
});

AllLines = new string[MAX]; //only allocate memory here

using (StreamReader sr = File.OpenText(fileName))

{

int x = 0;

while (!sr.EndOfStream)

{

AllLines[x] = sr.ReadLine();

x += 1;

}

} //CLOSE THE FILE because we are now DONE with it.

Parallel.For(0, AllLines.Length, x =>

{

TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]);

});

Reading the entire file into a string array object using the .Net ReadAllLines() method, then run a Parallel.For loop to process all the lines in parallel.

AllLines = new string[MAX]; //only allocate memory here
AllLines = File.ReadAllLines(fileName);
Parallel.For(0, AllLines.Length, x =>
{
    TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]);
});

AllLines = new string[MAX]; //only allocate memory here

AllLines = File.ReadAllLines(fileName);

Parallel.For(0, AllLines.Length, x =>

{

TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]);

});

Each line in the file is processed by being split into a string array containing its individual guids. Then each string is parsed character by character to determine if it’s a number and if so, so a mathematical calculation based on it.
The generated file is then deleted.

On a Windows 7 64-bit machine with 16GB of memory using a purely 7200 rpm mechanical drive as I didn’t want the effects the memory of a “hybrid” drive or mSata card might have on the system to taint the results.

This trial was run once, waiting 5 minutes after the machine was up and running from a cold start up. This was to eliminate any other background processes starting up with might detract from the test. There was no reason to run this test multiple times because as you’ll see, there are clear winners and losers.

The Runs:

Before starting, my hypothesis was that I expected the techniques that read the entire file into an array, and then using parallel for loops to process all the lines would win out hands down.

Let’s see what happened on my machine. Green cells indicate the winner(s) for that run; yellow second runners up.

All times are indicated in minutes:seconds.milliseconds format. Lower numbers indicate faster performance.

Run #1	5 Guids Per Line		10 Guids Per Line		25 Guids Per Line
	Lines per file:		Lines per file:		Lines per file:
	429,496	214,748	4,294,967	214,748	4,294,967	214,748
T1: string, ReadToEnd, process	26.0165	12.8108	51.2161	25.6457	2:08.0661	1:04.0958
T2: StringBuilder, ReadToEnd, process	25.8557	12.8692	51.2843	25.7571	2:08.7938	1:04.1300
T3: StreamReader, read line by line, process	25.5055	12.9920	50.9340	25.6576	2:07.8043	1:03.8621
T4: BufferedStream, read line by line, process	25.5241	12.8205	51.0251	25.5980	2:07.7404	1:03.8547
T5: BufferedStream with buffer size preset, read line by line, process	25.4960	12.8065	50.9899	25.5554	2:08.1822	1:04.0174
T6: StreamReader, read line by line into StringBuilder, process	25.6190	12.8883	51.0363	25.6011	2:07.9028	1:03.8462
T7: as above with StringBuilder size preset	25.5769	12.8838	51.3235	25.5201	2:08.4510	1:03.8346
T8: StreamReader, read into preset String[], process using Parallel.For	07.3555	03.9828	14.7095	07.8946	0:36.2732	0:18.7467
T9: ReadAllLines into String[], process using Parallel.For	07.2808	03.9742	14.8749	07.9938	0:38.9168	0:19.1223

Sha-Bam! Parallel Processing Dominates!

Seeing the results, there is no clear-cut winner between techniques T1 – T7. T8 & T9, which implemented the parallel processing techniques, completely dominated. Those techniques always finished in less than a third (33%) of the time it took any technique processing line by line.

The surprise for me came where each line was 10 guids in length. From that point forward, the .Net inbuilt File.ReadAllLines() method started performing slower. This wasn’t quite so evident when just plain reading a file. However, it indicates that if you really want to micro-optimize your code for speed, always pre-allocate the size of a string array when possible.

In Summary:

On my system, unless someone spots a flaw in my test code, reading an entire file into an array and then processing line-by-line using a parallel loop proved significantly more beneficial than reading a line, processing a line. Unfortunately I still see a lot of C# programmers and C# code running .Net 4 (or above) doing the age old “read a line, process line, repeat until end of file” technique instead of “read all the lines into memory and then process”. The performance difference is so great it even makes up for the loss of time when just reading a file.

This test code is just doing mathematical calculations too. The difference in performance may be even greater if you need to do other things to process your data as well, such as running a database query.

Obviously you should test on your system before micro-optimizing this functionality for your .Net application.

Otherwise, thanks to .Net 4, the age of parallel processing is easily accomplished in C# now. It’s time to break out of old patterns and start taking advantage of the power made available to us.

Bonus Link!

For all the readers who requested it, here’s C# code to serve as a starting point for you to do your own reading lines from a text file in batches and processing in parallel! Enjoy!

The Code:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Collections.Concurrent;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Threading;
namespace TestApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            DateTime end;
            DateTime start = DateTime.Now;
            Console.WriteLine("### Overall Start Time: " + start.ToLongTimeString());
            Console.WriteLine();
            TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 5000)), 5);
            TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 5000)), 10);
            TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 5000)), 25);
            TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 5);
            TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 10);
            TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 25);
            end = DateTime.Now;
            Console.WriteLine();
            Console.WriteLine("### Overall End Time: " + end.ToLongTimeString());
            Console.WriteLine("### Overall Run Time: " + (end - start));
            Console.WriteLine();
            Console.WriteLine("Hit Enter to Exit");
            Console.ReadLine();
        }
        //####################################################
        //Does a comparison of reading all the lines in from a file and performing some rudimentary
        //operations on them. Which way is fastest?
        static void TestReadingAndProcessingLinesFromFile(int numberOfLines, int numTimesGuidRepeated)
        {
            Console.WriteLine("######## " + System.Reflection.MethodBase.GetCurrentMethod().Name);
            Console.WriteLine("######## Number of lines in file: " + numberOfLines);
            Console.WriteLine("######## Number of times Guid repeated on each line: " + numTimesGuidRepeated);
            Console.WriteLine("###########################################################");
            Console.WriteLine();
            string g = String.Join(" ", Enumerable.Repeat(new Guid().ToString(), numTimesGuidRepeated));
            string[] AllLines = null;
            string fileName = "Performance_Test_File.txt";
            int MAX = numberOfLines;
            DateTime end;
            DateTime start = DateTime.Now;
            //Create the file populating it with GUIDs
            Console.WriteLine("Generating file: " + start.ToLongTimeString());
            using (StreamWriter sw = File.CreateText(fileName))
            {
                for (int x = 0; x < MAX; x++)
                {
                    sw.WriteLine(g);
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Just read everything into one string
            Console.WriteLine("Reading file reading to end into string: ");
            start = DateTime.Now;
            try
            {
                using (StreamReader sr = File.OpenText(fileName))
                {
                    string s = sr.ReadToEnd();
                    TestReadingAndProcessingLinesFromFile_DoStuff(s);
                }
                end = DateTime.Now;
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Read the entire contents into a StringBuilder object
            Console.WriteLine("Reading file reading to end into stringbuilder: ");
            start = DateTime.Now;
            try
            {
                using (StreamReader sr = File.OpenText(fileName))
                {
                    StringBuilder sb = new StringBuilder();
                    sb.Append(sr.ReadToEnd());
                    TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString()); //to simulate work
                }
                end = DateTime.Now;
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Standard and probably most common way of reading a file. 
            Console.WriteLine("Reading file assigning each line to string: ");
            start = DateTime.Now;
            using (StreamReader sr = File.OpenText(fileName))
            {
                string s = String.Empty;
                while ((s = sr.ReadLine()) != null)
                {
                    TestReadingAndProcessingLinesFromFile_DoStuff(s); //to simulate work
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Doing it the most common way, but using a Buffered Reader now.
            Console.WriteLine("Buffered reading file assigning each line to string: ");
            start = DateTime.Now;
            using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bs = new BufferedStream(fs))
            using (StreamReader sr = new StreamReader(bs))
            {
                string s;
                while ((s = sr.ReadLine()) != null)
                {
                    TestReadingAndProcessingLinesFromFile_DoStuff(s); //to simulate work
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Reading each line using a buffered reader again, but setting the buffer size since we know what it will be.
            Console.WriteLine("Buffered reading with preset buffer size assigning each line to string: ");
            start = DateTime.Now;
            using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bs = new BufferedStream(fs, System.Text.ASCIIEncoding.Unicode.GetByteCount(g)))
            using (StreamReader sr = new StreamReader(bs))
            {
                string s;
                while ((s = sr.ReadLine()) != null)
                {
                    TestReadingAndProcessingLinesFromFile_DoStuff(s); //to simulate work
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Read every line of the file reusing a StringBuilder object to save on string memory allocation times
            Console.WriteLine("Reading file assigning each line to StringBuilder: ");
            start = DateTime.Now;
            using (StreamReader sr = File.OpenText(fileName))
            {
                StringBuilder sb = new StringBuilder();
                while (sb.Append(sr.ReadLine()).Length > 0)
                {
                    TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString()); //to simulate work
                    sb.Clear();
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Reading each line into a StringBuilder, but setting the StringBuilder object to an initial
            //size since we know how long the longest line in the file is.
            Console.WriteLine("Reading file assigning each line to preset size StringBuilder: ");
            start = DateTime.Now;
            using (StreamReader sr = File.OpenText(fileName))
            {
                StringBuilder sb = new StringBuilder(g.Length);
                while (sb.Append(sr.ReadLine()).Length > 0)
                {
                    TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString()); //to simulate work
                    sb.Clear();
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine();
            GC.Collect();
            Thread.Sleep(1000);     //give disk hardware time to recover
            //Read each line into an array index. 
            Console.WriteLine("Reading each line into string array. Process with Parallel.For: ");
            start = DateTime.Now;
            try
            {
                AllLines = new string[MAX];    //only allocate memory here
                using (StreamReader sr = File.OpenText(fileName))
                {
                    int x = 0;
                    while (!sr.EndOfStream)
                    {
                        //we're just testing read speeds
                        AllLines[x] = sr.ReadLine();
                        x += 1;
                    }
                } //CLOSE THE FILE because we are now DONE with it.
                Parallel.For(0, AllLines.Length, x =>
                    {
                        TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]); //to simulate work
                    });
                end = DateTime.Now;
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            finally
            {
                if (AllLines != null)
                {
                    Array.Clear(AllLines, 0, AllLines.Length);
                    AllLines = null;
                }
            }
            GC.Collect();
            Thread.Sleep(1000);
            //Read the entire file using File.ReadAllLines. 
            Console.WriteLine("Performing File ReadAllLines into array. Process with Parallel.For: ");
            start = DateTime.Now;
            try
            {
                AllLines = new string[MAX];    //only allocate memory here
                AllLines = File.ReadAllLines(fileName);
                Parallel.For(0, AllLines.Length, x =>
                {
                    TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]); //to simulate work
                });
                end = DateTime.Now;
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (OutOfMemoryException)
            {
                end = DateTime.Now;
                Console.WriteLine("Not enough memory. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            catch (Exception)
            {
                end = DateTime.Now;
                Console.WriteLine("EXCEPTION. Couldn't perform this test.");
                Console.WriteLine("Finished at: " + end.ToLongTimeString());
                Console.WriteLine("Time: " + (end - start));
                Console.WriteLine();
            }
            finally
            {
                if (AllLines != null)
                {
                    Array.Clear(AllLines, 0, AllLines.Length);
                    AllLines = null;
                }
            }
            File.Delete(fileName);
            fileName = null;
            GC.Collect();
        }
        //Just simulates doing work on a line read from an input file
        static void TestReadingAndProcessingLinesFromFile_DoStuff(string s)
        {
            string[] sa = s.Split(new char[' ']);
            int[] ia = new int[sa.Length];
            int num = 0;
            for (int x = 0; x < sa.Length; x++)
            {
                foreach (char c in sa[x])
                {
                    if (int.TryParse(c.ToString(), out num))
                    {   //just doing some bogus mathematical calculations to simulate work
                        ia[x] = (int)((Math.Sqrt(Math.Log(num) % Math.Log10(num))) * (Math.Log(Math.Log10(num) / Math.Sqrt(num))));
                    }
                } 
            }
            //clean up
            Array.Clear(ia, 0, ia.Length);
            Array.Clear(sa, 0, sa.Length);
            ia = null;
            sa = null;
        }
    } //class
} //namespace

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

using System;

using System.Collections.Generic;

using System.Collections;

using System.Collections.Concurrent;

using System.IO;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;

using System.Threading.Tasks;

using System.Threading;

namespace TestApplication

{

class Program

{

static void Main(string[] args)

{

DateTime end;

DateTime start = DateTime.Now;

Console.WriteLine("### Overall Start Time: " + start.ToLongTimeString());

Console.WriteLine();

TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 5000)), 5);

TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 5000)), 10);

TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 5000)), 25);

TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 5);

TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 10);

TestReadingAndProcessingLinesFromFile((int)Math.Floor((double)(Int32.MaxValue / 10000)), 25);

end = DateTime.Now;

Console.WriteLine();

Console.WriteLine("### Overall End Time: " + end.ToLongTimeString());

Console.WriteLine("### Overall Run Time: " + (end - start));

Console.WriteLine();

Console.WriteLine("Hit Enter to Exit");

Console.ReadLine();

}

//####################################################

//Does a comparison of reading all the lines in from a file and performing some rudimentary

//operations on them. Which way is fastest?

static void TestReadingAndProcessingLinesFromFile(int numberOfLines, int numTimesGuidRepeated)

{

Console.WriteLine("######## " + System.Reflection.MethodBase.GetCurrentMethod().Name);

Console.WriteLine("######## Number of lines in file: " + numberOfLines);

Console.WriteLine("######## Number of times Guid repeated on each line: " + numTimesGuidRepeated);

Console.WriteLine("###########################################################");

Console.WriteLine();

string g = String.Join(" ", Enumerable.Repeat(new Guid().ToString(), numTimesGuidRepeated));

string[] AllLines = null;

string fileName = "Performance_Test_File.txt";

int MAX = numberOfLines;

DateTime end;

DateTime start = DateTime.Now;

//Create the file populating it with GUIDs

Console.WriteLine("Generating file: " + start.ToLongTimeString());

using (StreamWriter sw = File.CreateText(fileName))

{

for (int x = 0; x < MAX; x++)

{

sw.WriteLine(g);

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Just read everything into one string

Console.WriteLine("Reading file reading to end into string: ");

start = DateTime.Now;

try

{

using (StreamReader sr = File.OpenText(fileName))

{

string s = sr.ReadToEnd();

TestReadingAndProcessingLinesFromFile_DoStuff(s);

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (OutOfMemoryException)

{

end = DateTime.Now;

Console.WriteLine("Not enough memory. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (Exception)

{

end = DateTime.Now;

Console.WriteLine("EXCEPTION. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Read the entire contents into a StringBuilder object

Console.WriteLine("Reading file reading to end into stringbuilder: ");

start = DateTime.Now;

try

{

using (StreamReader sr = File.OpenText(fileName))

{

StringBuilder sb = new StringBuilder();

sb.Append(sr.ReadToEnd());

TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString()); //to simulate work

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (OutOfMemoryException)

{

end = DateTime.Now;

Console.WriteLine("Not enough memory. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (Exception)

{

end = DateTime.Now;

Console.WriteLine("EXCEPTION. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Standard and probably most common way of reading a file.

Console.WriteLine("Reading file assigning each line to string: ");

start = DateTime.Now;

using (StreamReader sr = File.OpenText(fileName))

{

string s = String.Empty;

while ((s = sr.ReadLine()) != null)

{

TestReadingAndProcessingLinesFromFile_DoStuff(s); //to simulate work

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Doing it the most common way, but using a Buffered Reader now.

Console.WriteLine("Buffered reading file assigning each line to string: ");

start = DateTime.Now;

using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))

using (BufferedStream bs = new BufferedStream(fs))

using (StreamReader sr = new StreamReader(bs))

{

string s;

while ((s = sr.ReadLine()) != null)

{

TestReadingAndProcessingLinesFromFile_DoStuff(s); //to simulate work

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Reading each line using a buffered reader again, but setting the buffer size since we know what it will be.

Console.WriteLine("Buffered reading with preset buffer size assigning each line to string: ");

start = DateTime.Now;

using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))

using (BufferedStream bs = new BufferedStream(fs, System.Text.ASCIIEncoding.Unicode.GetByteCount(g)))

using (StreamReader sr = new StreamReader(bs))

{

string s;

while ((s = sr.ReadLine()) != null)

{

TestReadingAndProcessingLinesFromFile_DoStuff(s); //to simulate work

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Read every line of the file reusing a StringBuilder object to save on string memory allocation times

Console.WriteLine("Reading file assigning each line to StringBuilder: ");

start = DateTime.Now;

using (StreamReader sr = File.OpenText(fileName))

{

StringBuilder sb = new StringBuilder();

while (sb.Append(sr.ReadLine()).Length > 0)

{

TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString()); //to simulate work

sb.Clear();

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Reading each line into a StringBuilder, but setting the StringBuilder object to an initial

//size since we know how long the longest line in the file is.

Console.WriteLine("Reading file assigning each line to preset size StringBuilder: ");

start = DateTime.Now;

using (StreamReader sr = File.OpenText(fileName))

{

StringBuilder sb = new StringBuilder(g.Length);

while (sb.Append(sr.ReadLine()).Length > 0)

{

TestReadingAndProcessingLinesFromFile_DoStuff(sb.ToString()); //to simulate work

sb.Clear();

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

GC.Collect();

Thread.Sleep(1000); //give disk hardware time to recover

//Read each line into an array index.

Console.WriteLine("Reading each line into string array. Process with Parallel.For: ");

start = DateTime.Now;

try

{

AllLines = new string[MAX]; //only allocate memory here

using (StreamReader sr = File.OpenText(fileName))

{

int x = 0;

while (!sr.EndOfStream)

{

//we're just testing read speeds

AllLines[x] = sr.ReadLine();

x += 1;

}

} //CLOSE THE FILE because we are now DONE with it.

Parallel.For(0, AllLines.Length, x =>

{

TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]); //to simulate work

});

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (OutOfMemoryException)

{

end = DateTime.Now;

Console.WriteLine("Not enough memory. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (Exception)

{

end = DateTime.Now;

Console.WriteLine("EXCEPTION. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

finally

{

if (AllLines != null)

{

Array.Clear(AllLines, 0, AllLines.Length);

AllLines = null;

}

GC.Collect();

Thread.Sleep(1000);

//Read the entire file using File.ReadAllLines.

Console.WriteLine("Performing File ReadAllLines into array. Process with Parallel.For: ");

start = DateTime.Now;

try

{

AllLines = new string[MAX]; //only allocate memory here

AllLines = File.ReadAllLines(fileName);

Parallel.For(0, AllLines.Length, x =>

{

TestReadingAndProcessingLinesFromFile_DoStuff(AllLines[x]); //to simulate work

});

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (OutOfMemoryException)

{

end = DateTime.Now;

Console.WriteLine("Not enough memory. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

catch (Exception)

{

end = DateTime.Now;

Console.WriteLine("EXCEPTION. Couldn't perform this test.");

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine();

}

finally

{

if (AllLines != null)

{

Array.Clear(AllLines, 0, AllLines.Length);

AllLines = null;

}

File.Delete(fileName);

fileName = null;

GC.Collect();

}

//Just simulates doing work on a line read from an input file

static void TestReadingAndProcessingLinesFromFile_DoStuff(string s)

{

string[] sa = s.Split(new char[' ']);

int[] ia = new int[sa.Length];

int num = 0;

for (int x = 0; x < sa.Length; x++)

{

foreach (char c in sa[x])

{

if (int.TryParse(c.ToString(), out num))

{ //just doing some bogus mathematical calculations to simulate work

ia[x] = (int)((Math.Sqrt(Math.Log(num) % Math.Log10(num))) * (Math.Log(Math.Log10(num) / Math.Sqrt(num))));

}

//clean up

Array.Clear(ia, 0, ia.Length);

Array.Clear(sa, 0, sa.Length);

ia = null;

sa = null;

}

} //class

} //namespace

Spread the love

The Fastest Way to Read and Process Text Files using C# .Net

ByDavid Lozinski

Using C# .Net: Fastest Way to Read and Process Text Files

Settings Things Up:

The Runs:

Sha-Bam! Parallel Processing Dominates!

In Summary:

Bonus Link!

The Code:

Related Post

Math.Max/Min vs inline comparisons

Division Vs Multiplication Equivalent

Does caching calculated loop indexes make a difference?

You missed

Change your default DNS for performance and privacy upgrades

Comparing Microsoft Copilot, Copilot M365, and ChatGPT

Dating Multiple AI’s

Choosing CloudFlare over Microsoft Azure