Spread the love

C#.Net: The Fastest Way to count substring occurrences in a string

This will examine many techniques to determine in C# .Net: the Fastest Way to count substring occurrences in a string.

Background:

This test stemmed from a project where I had to do a lot of substring searches. I couldn’t use some simple built-in methods like String.Contains() or IndexOf() straight out of the box because they only test to see if a substring exists, not how many times a string occurs in another string.

Initially I fell back on the good old tried, tested, and proven custom counting method that I’ve used numerous times. Also, if you followed my Fastest Way to check if a string occurs within a string benchmarks:

String.Contains() is a very fast method to use if possible
I knew I could use Regex, but tests have shown Regex to be very slow and not ideal.

So that’s when this curious consultant started wondering… are there other techniques to count how many times a substring occurs in a string? If so, what is the fastest way to count substring occurrences within another string?

The Set Up:

A C# Console application was written to test several counting techniques: 5 single-threaded; 5 multi-threaded.

The code is written in Visual Studio 2012 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system if you wish.

In a nutshell, the code does the following:

Creates an array of random strings that will serve as the strings to be searched using the System.Web.Security.Membership.GeneratePassword method. We’ll call these “A”.
Creates an array of random strings that will serve as the strings being searched for using the System.Web.Security.Membership.GeneratePassword method. We’ll call these “B”.
Executes the techniques to see if any of the search strings occur in the strings to be searched. The techniques are as follows where “ss” is the array of “Strings to Search” and “sf” is the array of strings to “Search For”:

Technique

Code Snippet

Common Counting

for (int y = 0; y &lt; sf.Length; y++)
{
    c[y] += (ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length;
}

for (int y = 0; y < sf.Length; y++)

{

c[y] += (ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length;

}

Code from DotNetPerls

for (int y = 0; y &lt; sf.Length; y++)
{
    z = 0;
    while ((z = ss[x].IndexOf(sf[y], z)) != -1)
    {
        z += sf[y].Length;
        c[y] += 1;
    }
}

for (int y = 0; y < sf.Length; y++)

{

z = 0;

while ((z = ss[x].IndexOf(sf[y], z)) != -1)

{

z += sf[y].Length;

c[y] += 1;

}

String.Split; count results.

for (int y = 0; y &lt; sf.Length; y++)
{
    c[y] += ss[x].Split(new string[] { sf[y] }, StringSplitOptions.None).Count() - 1;
}

for (int y = 0; y < sf.Length; y++)

{

c[y] += ss[x].Split(new string[] { sf[y] }, StringSplitOptions.None).Count() - 1;

}

C# Regex

for (int y = 0; y &lt; sf.Length; y++)
{
    c[y] += Regex.Matches(ss[x], Regex.Escape(sf[y])).Count;
}

for (int y = 0; y < sf.Length; y++)

{

c[y] += Regex.Matches(ss[x], Regex.Escape(sf[y])).Count;

}

C# Regex w/ AsParallel()

total += sf.AsParallel().Sum(s =&gt; Regex.Matches(ss[x], Regex.Escape(s)).Count);

1	total += sf.AsParallel().Sum(s => Regex.Matches(ss[x], Regex.Escape(s)).Count);

T1 implemented with a multi-threaded Parallel.For loop

T2 implemented with a multi-threaded Parallel.For loop

T3 implemented with a multi-threaded Parallel.For loop

T4 implemented with a multi-threaded Parallel.For loop

T10

T5 implemented with a multi-threaded Parallel.For loop

The sum from adding up the number of times a match is found is displayed along with the execution time
The arrays are cleared out and deleted being the responsible programmer I am. 🙂

The code assumes all tests will happen with no exception testing because I’m just wanting to test the raw speed of finding substrings in a string. There may be other techniques, but I couldn’t find or think of any. However, if you have a great alternative solution, please post!

It also assumes that substrings do not overlap.

The exe file was installed and run on an Alienware M17X R3 running Windows 7 64-bit with 16 GB memory on an i7-2820QM processor.
The test was run for each technique over the following number of comparisons:

Counting the number of times 1 string occurs in 5,000, 25,000, 100,000, and 1,000,000 million strings.
Counting the number of times 100 strings occur in 5,000, 25,000, 100,000, and 1,000,000 million strings.
Counting the number of times 1,000 strings occur in 5,000, 25,000, 100,000, and 1,000,000 million strings.

Yes in most cases, programmers are only going to test if 1 or 2 strings occur in another string. But:

We’re having fun aren’t we?
Since the project I worked on had several hundred thousand strings that needed to be searched, I was definitely interested in how all the techniques handled larger quantities.

The Runs:

Before starting, my hypothesis was that I expected the variations of the custom counting method to be the winner since it generally does perform the best whether counting substring occurrences in strings or characters in strings. It’s amazing how fast and versatile this method is.

Let’s see what happened on my machine.

All times are indicated in hours:minutes:seconds.milliseconds format. Yes, some took over an hour! Lower numbers are better because they indicate faster performance.

Winners are highlighted in green.

Counting the number of occurrences of 1 string in each of
Run #1:	5,000	25,000	100,000	1,000,000
T1: Common Counting	0.00	0.0010000	0.0070004	00.0710040
T2: Code from DotNetPerls	0.0010000	0.00	0.0130008	00.1330082
T3: Split String Count	0.0020001	0.0070004	0.0270017	00.2520133
T4: Regex.Matches.Count	0.0040003	0.0170010	0.0580033	00.5440323
T5: AsParallel w/ Regex	0.1570085	0.5600008	2.4421396	29.8928806

T6: T1 using Parallel.For	0.0190010	0.0010001	0.0020001	0.0210012
T7: T2 using Parallel.For	0.0020002	0.0010000	0.0040003	0.0380021
T8: T3 using Parallel.For	0.0030001	0.0100001	0.0140008	0.1800048
T9: T4 using Parallel.For	0.0060004	0.0200010	0.1020036	1.7210413
T10: T5 using Parallel.For	5.5300077	0.8320046	5.3341221	2:36.9828740

Counting the number of occurrences of 100 strings in each of
Run #2:	5,000	25,000	100,000	1,000,000
T1: Common Counting	0.0310018	00.1570090	00.6170353	0:06.0542581
T2: Code from DotNetPerls	0.0580033	00.2930167	01.1640666	0:11.3702949
T3: Split String Count	0.0890051	00.4480256	02.1521002	0:16.9683032
T4: Regex.Matches.Count	2.0531174	10.3155901	43.2109377	6:30.6880561
T5: AsParallel w/ Regex	2.7730793	14.1273355	51.6757052	8:12.0271304

T6: T1 using Parallel.For	0.0070004	00.0380021	00.1420059	00:01.5230843
T7: T2 using Parallel.For	0.0180010	00.0750049	00.3050192	00:03.3811934
T8: T3 using Parallel.For	0.1030022	00.6790102	00.7300389	00:07.3602943
T9: T4 using Parallel.For	1:25.7861022	8:01.4317111	42.4874429	10:36.6637675
T10: T5 using Parallel.For	1:46.1456623	9:01.4171999	41.8417822	27:55.6335064

Counting the number of occurrences of 1,000 strings in each of
Run #3:	5,000	25,000	100,000	1,000,000
T1: Common Counting	00.2900004	0:01.4860621	00:05.8750850	0:00:58.7494732
T2: Code from DotNetPerls	00.5600007	0:02.7800039	00:11.3132955	0:01:53.0386768
T3: Split String Count	00.8400012	0:04.2170303	00:16.8953990	0:02:50.4861649
T4: Regex.Matches.Count	20.5655871	1:47.1916370	06:46.6989794	1:05:35.8084806
T5: AsParallel w/ Regex	16.6234057	1:24.2900854	05:36.0162879	0:56:57.8961879

T6: T1 using Parallel.For	00.0810018	0:00.3720185	00:01.5030859	0:00:15.1558518
T7: T2 using Parallel.For	00.1710098	0:00.8580491	00:03.3751931	0:00:33.7619311
T8: T3 using Parallel.For	00.3880016	0:01.8071199	00:08.6232645	0:02:03.7490701
T9: T4 using Parallel.For	20.0905242	1:40.5890094	07:46.7686951	1:12:04.7092390
T10: T5 using Parallel.For	15.5083999	1:21.5940674	05:02.5684600	0:52:34.3056667

The Results:

The technique T1 implementing a common counting method won by running faster every single time. In more than half the tests it completed roughly twice as fast as the next fastest technique, T2 from DotNetPerls. As the old saying goes, “green across the board”.

T1, T2, & their multi-threaded equivalents dominated.

The real surprise was how the single-threaded T1, in most cases, ran faster than half the multi-threaded parallel techniques even at the higher string counts!

Regex is a powerful tool, but as one can see from the results, if you know you’re going to perform counts 100 times or more for a simple string rather than a complicated pattern, Regex should not be used.

The other techniques were respectable: not the greatest, not the worst. They only seriously started falling behind when counting 100 or more substrings in more than 1000 strings, which as I stated probably doesn’t happen too often in real world programming applications.

In Summary:

On my system, unless someone spots a flaw in my test code, Technique 1 with the common counting method was the single-threaded winner. Implementing the parallel version was the multi-threaded winner. Any application that needs micro optimization or where speed is of the essence, nothing else should be considered.

Otherwise, there are two other techniques that can be utilized. It really comes down to personal taste and how much one cares about code readability versus speed.

If you’re only going to be counting a few substrings within a small number of search strings, you can consider Regex as a readable alternative. Again it’s that speed versus readability trade-off. If you’re going to count more than just a few substrings occurrences, either don’t use Regex or consider taking a long lunch break while waiting for the results.

The Code:

using System;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Threading;

namespace TestApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            DateTime end;
            DateTime start = DateTime.Now;

            Console.WriteLine("### Overall Start Time: " + start.ToLongTimeString());
            Console.WriteLine();

            TestFastestWayToCountSubstringOccurrencesInAString(5000, 1);
            TestFastestWayToCountSubstringOccurrencesInAString(25000, 1);
            TestFastestWayToCountSubstringOccurrencesInAString(100000, 1);
            TestFastestWayToCountSubstringOccurrencesInAString(1000000, 1);
            TestFastestWayToCountSubstringOccurrencesInAString(5000, 100);
            TestFastestWayToCountSubstringOccurrencesInAString(25000, 100);
            TestFastestWayToCountSubstringOccurrencesInAString(100000, 100);
            TestFastestWayToCountSubstringOccurrencesInAString(1000000, 100);
            TestFastestWayToCountSubstringOccurrencesInAString(5000, 1000);
            TestFastestWayToCountSubstringOccurrencesInAString(25000, 1000);
            TestFastestWayToCountSubstringOccurrencesInAString(100000, 1000);
            TestFastestWayToCountSubstringOccurrencesInAString(1000000, 1000);

            end = DateTime.Now;
            Console.WriteLine();
            Console.WriteLine("### Overall End Time: " + end.ToLongTimeString());
            Console.WriteLine("### Overall Run Time: " + (end - start));

            Console.WriteLine();
            Console.WriteLine("Hit Enter to Exit");
            Console.ReadLine();

        }

        //###############################################################

        //what is the fastest to count the number of occurrences of a substring
        //within a string?

        static void TestFastestWayToCountSubstringOccurrencesInAString(int NumberOfStringsToGenerate, int NumberOfSearchStringsToGenerate)
        {
            Console.WriteLine("######## " + System.Reflection.MethodBase.GetCurrentMethod().Name);
            Console.WriteLine("Number of Random Strings that will be generated: " + NumberOfStringsToGenerate.ToString("#,##0"));
            Console.WriteLine("Number of Search Strings that will be generated: " + NumberOfSearchStringsToGenerate.ToString("#,##0"));
            Console.WriteLine();

            int total = 0;
            object lockObject = new object();
            DateTime end = DateTime.Now;
            DateTime start = DateTime.Now;
            //the strings to search
            string[] ss = new string[NumberOfStringsToGenerate];
            //the strings to look for
            string[] sf = new string[NumberOfSearchStringsToGenerate];
            //the count of each substring finding
            int[] c = new int[sf.Length];

            //Generate the string arrays
            int z = 10;

            //strings to be searched. Completely random. Using generate password method to come up with all sorts of mixtures.
			Console.WriteLine("Generating strings to search.");
			for (int x = 0; x &lt; ss.Length; x++)             {                 ss[x] = System.Web.Security.Membership.GeneratePassword(z, x%5);                 z += 1;                 if (z &gt; 25)
                    z = 10;
            }

            //strings to search for
			Console.WriteLine("Generating strings to search for.");
			z = 2;
            for (int x = 0; x &lt; sf.Length; x++ )             {                 sf[x] = System.Web.Security.Membership.GeneratePassword(z, x%2);                 z += 1;                 if (z &gt; 8)
                    z = 2;
            }

			Console.WriteLine("###########################################################");
			Console.WriteLine();

            //return (strToSearch.Length - strToSearch.Replace(strKeyToLookFor, String.Empty).Length) / strKeyToLookFor.Length;
            Console.WriteLine("Starting method: \"custom method\".");
            start = DateTime.Now;   
            for (int x = 0; x &lt; ss.Length; x++)
            {
                for (int y = 0; y &lt; sf.Length; y++)
                {
                    c[y] += (ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length;
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            total = 0;
            for (int x = 0; x &lt; c.Length; x++ )
            {
                total += c[x];
            }
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            //http://www.dotnetperls.com/string-occurrence
            Array.Clear(c, 0, c.Length);
            Console.WriteLine("Starting method: \"CountStringOccurrencesFromDotNetPerls\".");
            start = DateTime.Now;
            for (int x = 0; x &lt; ss.Length; x++)
            {
                for (int y = 0; y &lt; sf.Length; y++)
                {
                    z = 0;
                    while ((z = ss[x].IndexOf(sf[y], z)) != -1)
                    {
                        z += sf[y].Length;
                        c[y] += 1;
                    }
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            total = 0;
            for (int x = 0; x &lt; c.Length; x++)
            {
                total += c[x];
            }
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            Array.Clear(c, 0, c.Length);
            Console.WriteLine("Starting method: \"string split count\".");
            start = DateTime.Now;
            for (int x = 0; x &lt; ss.Length; x++)
            {
                for (int y = 0; y &lt; sf.Length; y++)
                {
                    c[y] += ss[x].Split(new string[] { sf[y] }, StringSplitOptions.None).Count() - 1;
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            total = 0;
            for (int x = 0; x &lt; c.Length; x++)
            {
                total += c[x];
            }
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            Array.Clear(c, 0, c.Length);
            Console.WriteLine("Starting method: \"regex uncompiled\".");
            start = DateTime.Now;
            for (int x = 0; x &lt; ss.Length; x++)
            {
                for (int y = 0; y &lt; sf.Length; y++)
                {
                    c[y] += Regex.Matches(ss[x], Regex.Escape(sf[y])).Count;
                }
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            total = 0;
            for (int x = 0; x &lt; c.Length; x++)
            {
                total += c[x];
            }
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            total = 0;
            Console.WriteLine("Starting method: \"Single AsParallel() Regex\".");
            start = DateTime.Now;
            for (int x = 0; x &lt; ss.Length; x++)             {                 total += sf.AsParallel().Sum(s =&gt; Regex.Matches(ss[x], Regex.Escape(s)).Count);
            }
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            total = 0;
            Console.WriteLine("Starting method: \"Parallel For using custom counting\".");
            start = DateTime.Now;
            Parallel.For(0, ss.Length,
                () =&gt; 0,
                (x, loopState, subtotal) =&gt;
                {
                    for (int y = 0; y &lt; sf.Length; y++)                     {                         subtotal += (ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length;                     }                     return subtotal;                 },                 (s) =&gt;
                {
                    lock (lockObject)
                    {
                        total += s;
                    }
                }
            );
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            total = 0;
            Console.WriteLine("Starting method: \"Parallel For using DotNetPerls technique\".");
            start = DateTime.Now;
            Parallel.For(0, ss.Length,
                () =&gt; 0,
                (x, loopState, subtotal) =&gt;
                {
                    int a = 0;
                    for (int y = 0; y &lt; sf.Length; y++)                     {                         a = 0;                         while ((a = ss[x].IndexOf(sf[y], a)) != -1)                         {                             a += sf[y].Length;                             subtotal += 1;                         }                     }                     return subtotal;                 },                 (s) =&gt;
                {
                    lock (lockObject)
                    {
                        total += s;
                    }
                }
            );
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            total = 0;
            Console.WriteLine("Starting method: \"Parallel For using Split String\".");
            start = DateTime.Now;
            Parallel.For(0, ss.Length,
                () =&gt; 0,
                (x, loopState, subtotal) =&gt;
                {
                    for (int y = 0; y &lt; sf.Length; y++)                     {                         subtotal += ss[x].Split(new string[] { sf[y] }, StringSplitOptions.None).Count() - 1;                     }                     return subtotal;                 },                 (s) =&gt;
                {
                    lock (lockObject)
                    {
                        total += s;
                    }
                }
            );
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            total = 0;
            Console.WriteLine("Starting method: \"Parallel For using Regex Matches Count\".");
            start = DateTime.Now;
            Parallel.For(0, ss.Length,
                () =&gt; 0,
                (x, loopState, subtotal) =&gt;
                {
                    for (int y = 0; y &lt; sf.Length; y++)                     {                         subtotal += Regex.Matches(ss[x], Regex.Escape(sf[y])).Count;                     }                     return subtotal;                 },                 (s) =&gt;
                {
                    lock (lockObject)
                    {
                        total += s;
                    }
                }
            );
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            total = 0;
            Console.WriteLine("Starting method: \"Parallel For AsParallel() Regex\".");
            start = DateTime.Now;
            Parallel.For(0, ss.Length,
                () =&gt; 0,
                (x, loopState, subtotal) =&gt;
                {
                    subtotal += sf.AsParallel().Sum(s =&gt; Regex.Matches(ss[x], Regex.Escape(s)).Count);
                    return subtotal;
                },
                (s) =&gt;
                {
                    lock (lockObject)
                    {
                        total += s;
                    }
                }
            );
            end = DateTime.Now;
            Console.WriteLine("Finished at: " + end.ToLongTimeString());
            Console.WriteLine("Time: " + (end - start));
            Console.WriteLine("Total finds: " + total + Environment.NewLine);
            Console.WriteLine();
            Console.WriteLine("###########################################################");
            Console.WriteLine();

            lockObject = null;
            Array.Clear(ss, 0, ss.Length);
            ss = null;
            Array.Clear(sf, 0, sf.Length);
            sf = null;
            Array.Clear(c, 0, c.Length);
            c = null;
            GC.Collect();

        }
    }
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

using System;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;

using System.Threading.Tasks;

using System.Threading;

namespace TestApplication

{

class Program

{

static void Main(string[] args)

{

DateTime end;

DateTime start = DateTime.Now;

Console.WriteLine("### Overall Start Time: " + start.ToLongTimeString());

Console.WriteLine();

TestFastestWayToCountSubstringOccurrencesInAString(5000, 1);

TestFastestWayToCountSubstringOccurrencesInAString(25000, 1);

TestFastestWayToCountSubstringOccurrencesInAString(100000, 1);

TestFastestWayToCountSubstringOccurrencesInAString(1000000, 1);

TestFastestWayToCountSubstringOccurrencesInAString(5000, 100);

TestFastestWayToCountSubstringOccurrencesInAString(25000, 100);

TestFastestWayToCountSubstringOccurrencesInAString(100000, 100);

TestFastestWayToCountSubstringOccurrencesInAString(1000000, 100);

TestFastestWayToCountSubstringOccurrencesInAString(5000, 1000);

TestFastestWayToCountSubstringOccurrencesInAString(25000, 1000);

TestFastestWayToCountSubstringOccurrencesInAString(100000, 1000);

TestFastestWayToCountSubstringOccurrencesInAString(1000000, 1000);

end = DateTime.Now;

Console.WriteLine();

Console.WriteLine("### Overall End Time: " + end.ToLongTimeString());

Console.WriteLine("### Overall Run Time: " + (end - start));

Console.WriteLine();

Console.WriteLine("Hit Enter to Exit");

Console.ReadLine();

}

//###############################################################

//what is the fastest to count the number of occurrences of a substring

//within a string?

static void TestFastestWayToCountSubstringOccurrencesInAString(int NumberOfStringsToGenerate, int NumberOfSearchStringsToGenerate)

{

Console.WriteLine("######## " + System.Reflection.MethodBase.GetCurrentMethod().Name);

Console.WriteLine("Number of Random Strings that will be generated: " + NumberOfStringsToGenerate.ToString("#,##0"));

Console.WriteLine("Number of Search Strings that will be generated: " + NumberOfSearchStringsToGenerate.ToString("#,##0"));

Console.WriteLine();

int total = 0;

object lockObject = new object();

DateTime end = DateTime.Now;

DateTime start = DateTime.Now;

//the strings to search

string[] ss = new string[NumberOfStringsToGenerate];

//the strings to look for

string[] sf = new string[NumberOfSearchStringsToGenerate];

//the count of each substring finding

int[] c = new int[sf.Length];

//Generate the string arrays

int z = 10;

//strings to be searched. Completely random. Using generate password method to come up with all sorts of mixtures.

Console.WriteLine("Generating strings to search.");

for (int x = 0; x < ss.Length; x++) { ss[x] = System.Web.Security.Membership.GeneratePassword(z, x%5); z += 1; if (z > 25)

z = 10;

}

//strings to search for

Console.WriteLine("Generating strings to search for.");

z = 2;

for (int x = 0; x < sf.Length; x++ ) { sf[x] = System.Web.Security.Membership.GeneratePassword(z, x%2); z += 1; if (z > 8)

z = 2;

}

Console.WriteLine("###########################################################");

Console.WriteLine();

//return (strToSearch.Length - strToSearch.Replace(strKeyToLookFor, String.Empty).Length) / strKeyToLookFor.Length;

Console.WriteLine("Starting method: \"custom method\".");

start = DateTime.Now;

for (int x = 0; x < ss.Length; x++)

{

for (int y = 0; y < sf.Length; y++)

{

c[y] += (ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length;

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

total = 0;

for (int x = 0; x < c.Length; x++ )

{

total += c[x];

}

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

//http://www.dotnetperls.com/string-occurrence

Array.Clear(c, 0, c.Length);

Console.WriteLine("Starting method: \"CountStringOccurrencesFromDotNetPerls\".");

start = DateTime.Now;

for (int x = 0; x < ss.Length; x++)

{

for (int y = 0; y < sf.Length; y++)

{

z = 0;

while ((z = ss[x].IndexOf(sf[y], z)) != -1)

{

z += sf[y].Length;

c[y] += 1;

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

total = 0;

for (int x = 0; x < c.Length; x++)

{

total += c[x];

}

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

Array.Clear(c, 0, c.Length);

Console.WriteLine("Starting method: \"string split count\".");

start = DateTime.Now;

for (int x = 0; x < ss.Length; x++)

{

for (int y = 0; y < sf.Length; y++)

{

c[y] += ss[x].Split(new string[] { sf[y] }, StringSplitOptions.None).Count() - 1;

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

total = 0;

for (int x = 0; x < c.Length; x++)

{

total += c[x];

}

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

Array.Clear(c, 0, c.Length);

Console.WriteLine("Starting method: \"regex uncompiled\".");

start = DateTime.Now;

for (int x = 0; x < ss.Length; x++)

{

for (int y = 0; y < sf.Length; y++)

{

c[y] += Regex.Matches(ss[x], Regex.Escape(sf[y])).Count;

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

total = 0;

for (int x = 0; x < c.Length; x++)

{

total += c[x];

}

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

total = 0;

Console.WriteLine("Starting method: \"Single AsParallel() Regex\".");

start = DateTime.Now;

for (int x = 0; x < ss.Length; x++) { total += sf.AsParallel().Sum(s => Regex.Matches(ss[x], Regex.Escape(s)).Count);

}

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

total = 0;

Console.WriteLine("Starting method: \"Parallel For using custom counting\".");

start = DateTime.Now;

Parallel.For(0, ss.Length,

() => 0,

(x, loopState, subtotal) =>

{

for (int y = 0; y < sf.Length; y++) { subtotal += (ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length; } return subtotal; }, (s) =>

{

lock (lockObject)

{

total += s;

}

);

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

total = 0;

Console.WriteLine("Starting method: \"Parallel For using DotNetPerls technique\".");

start = DateTime.Now;

Parallel.For(0, ss.Length,

() => 0,

(x, loopState, subtotal) =>

{

int a = 0;

for (int y = 0; y < sf.Length; y++) { a = 0; while ((a = ss[x].IndexOf(sf[y], a)) != -1) { a += sf[y].Length; subtotal += 1; } } return subtotal; }, (s) =>

{

lock (lockObject)

{

total += s;

}

);

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

total = 0;

Console.WriteLine("Starting method: \"Parallel For using Split String\".");

start = DateTime.Now;

Parallel.For(0, ss.Length,

() => 0,

(x, loopState, subtotal) =>

{

for (int y = 0; y < sf.Length; y++) { subtotal += ss[x].Split(new string[] { sf[y] }, StringSplitOptions.None).Count() - 1; } return subtotal; }, (s) =>

{

lock (lockObject)

{

total += s;

}

);

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

total = 0;

Console.WriteLine("Starting method: \"Parallel For using Regex Matches Count\".");

start = DateTime.Now;

Parallel.For(0, ss.Length,

() => 0,

(x, loopState, subtotal) =>

{

for (int y = 0; y < sf.Length; y++) { subtotal += Regex.Matches(ss[x], Regex.Escape(sf[y])).Count; } return subtotal; }, (s) =>

{

lock (lockObject)

{

total += s;

}

);

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

total = 0;

Console.WriteLine("Starting method: \"Parallel For AsParallel() Regex\".");

start = DateTime.Now;

Parallel.For(0, ss.Length,

() => 0,

(x, loopState, subtotal) =>

{

subtotal += sf.AsParallel().Sum(s => Regex.Matches(ss[x], Regex.Escape(s)).Count);

return subtotal;

(s) =>

{

lock (lockObject)

{

total += s;

}

);

end = DateTime.Now;

Console.WriteLine("Finished at: " + end.ToLongTimeString());

Console.WriteLine("Time: " + (end - start));

Console.WriteLine("Total finds: " + total + Environment.NewLine);

Console.WriteLine();

Console.WriteLine("###########################################################");

Console.WriteLine();

lockObject = null;

Array.Clear(ss, 0, ss.Length);

ss = null;

Array.Clear(sf, 0, sf.Length);

sf = null;

Array.Clear(c, 0, c.Length);

c = null;

GC.Collect();

}

(Visited 17,452 times, 1 visits today)

Spread the love

C#.Net: The Fastest Way to count substring occurrences in a string

ByDavid Lozinski

C#.Net: The Fastest Way to count substring occurrences in a string

Background:

The Set Up:

The Runs:

The Results:

In Summary:

The Code:

Related Post

Math.Max/Min vs inline comparisons

Division Vs Multiplication Equivalent

Does caching calculated loop indexes make a difference?

You missed

Comparing Microsoft Copilot, Copilot M365, and ChatGPT

Dating Multiple AI’s

Choosing CloudFlare over Microsoft Azure

Convert a Text field to Proper Case using Power Automate