9

Fastest Collection for String Lookups in C# .Net

Spread the love

Fastest Collection for String Lookups in C# .Net

All C# programmers have had to do this at some point in their career… look for a particular string as some sort of key or value within a data structure. But how do the multitude of data structures compare against each other for the fastest lookup of said key or value? What is the fastest collection for string lookups in C#?

 

The Background:

In C#, some of the most common collections for storing data are arrays, lists, dictionaries, and collections based on hashes. Of these, some allow for the storage of “keys” as strings; others only allow strings to be stored as “values”; and there are some which take the middle ground of allowing strings to be stored as both a “key” and a “value”.

This test stemmed from several projects where the code created dictionary objects or HashSets as in-memory references to see if particular records already existed with the particular string as a key.

I saw a lot of conditions similar to:

So that’s when this curious consultant started wondering… what’s the fastest collection for string lookups in C# .Net? Whether by key or value?

 

Here’s the Setup:

I came up with a list of collections I believe are commonly used, and broke them into two groups: those that allow for “strings” to be used as “keys” for lookups, and those that allow for strings to be used as “values”. The list is divided as follows:

 

Searching Keys:

 

Searching Values:

 

I wrote a C# Console application to test several methods of key and value lookups.

The code is written in Visual Studio 2013 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system if you wish.

 

In a nutshell, the code does the following:

  1. Creates the storage objects, pre-sizing them where possible.
  2. Generates the random strings to search. Each object is populated with the exact same strings.
  3. Also generates the strings to search for, purposely copying 1 of every 3 random strings to search to make sure we have at least a few matches.
  4. Searches the objects by their key, value, or both, implementing any of the various techniques.
  5. Whenever a match is found, a counter is incremented. This is to ensure every technique finds the exact same amount.

 

The code tests both the number and length of the strings as I was curious if the length of the strings impacted the performance at all.

The exe file was run on Windows 7 64-bit with 16 GB memory.

The test was run for the following comparisons:

  • Searching 100 strings for strings that are 12, 50, and 128 characters in length
  • Searching 5,000 strings for strings that are 12, 50, and 128 characters in length
  • Searching 25,000 strings for strings that are 12, 50, and 128 characters in length
  • Searching 100,000 strings for strings that are 12, 50, and 128 characters in length

 

The Runs:

My prediction is the Dictionary (by key), Sorted Dictionary (by key), and HashSet would provide the fastest overall performance; when it comes to Arrays and Lists, I think it will be a toss up between the binary searches (as the amount of data gets larger) and the Array custom method.

Let’s see what happened on my machine.

All times are indicated in minutes:seconds.milliseconds format. Lower numbers indicate faster run time performance.

The winners are highlighted in green:

 

Search Performance:

# to Search:

100

String Lengths:

12

50

128

——– Single Threaded “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00

Sorted Dictionary (by key)

00:00

00:00

00:00

Concurrent Dictionary (by key)

00:00

00:00

00:00

HashSet

00:00

00:00

00:00

HashTable (by key)

00:00

00:00

00:00

 

 

——– Single Threaded “Value” Search: ——–

Array

00:00

00:00

00:00

Array custom method

00:00

00:00

00:00

Array: Binary Search

00:00

00:00

00:00

Array List – Contains

00:00

00:00

00:00

Array List – For Loop

00:00

00:00

00:00

List – Contains

00:00

00:00

00:00

List – For Loop

00:00

00:00

00:00

List: Binary Search

00:00

00:00

00:00

Sorted List

00:00

00:00

00:00

Dictionary (by value)

00:00

00:00

00:00

Sorted Dictionary (by value)

00:00.0312001

00:00

00:00

Concurrent Dictionary (by value)

00:00

00:00

00:00

HashTable (by value)

00:00

00:00

00:00

 

 

——– Parallel “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00

Sorted Dictionary (by key)

00:00

00:00

00:00

Concurrent Dictionary (by key)

00:00

00:00

00:00

HashSet

00:00

00:00

00:00

HashTable (by key)

00:00

00:00

00:00

 

 

——– Parallel “Value” Search: ——–

Array

00:00.2652005

00:00

00:00

Array – Custom method

00:00

00:00

00:00

Array: Binary Search

00:00

00:00

00:00

Array List – Contains

00:00

00:00

00:00

Array List – For Loop

00:00

00:00

00:00

List – Contains

00:00

00:00

00:00

List – For Loop

00:00

00:00

00:00

List: Binary Search

00:00

00:00

00:00

Sorted List

00:00

00:00

00:00

Dictionary (by value)

00:00

00:00

00:00

Sorted Dictionary (by value)

00:00

00:00

00:00

Concurrent Dictionary (by value)

00:00

00:00

00:00

HashTable (by value)

00:00

00:00

00:00

 

 

SearchPerformance:

# to Search:

5,000

String Lengths:

12

50

128

——– Single Threaded “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00

Sorted Dictionary (by key)

00:00.0156000

00:00.0156000

00:00.0156000

Concurrent Dictionary (by key)

00:00

00:00

00:00

HashSet

00:00

00:00

00:00

HashTable (by key)

00:00

00:00

00:00

 

 

——– Single Threaded “Value” Search: ——–

Array

00:01.4352025

00:01.4508026

00:01.4196025

Array custom method

00:00.8892016

00:00.8580015

00:00.8736016

Array: Binary Search

00:00.0156000

00:00.0156000

00:00.0156001

Array List – Contains

00:00.2028003

00:00.2028004

00:00.2184004

Array List – For Loop

00:00.0936002

00:00.1246002

00:00.1248002

List – Contains

00:00.2340005

00:00.2652004

00:00.2652005

List – For Loop

00:00.2184004

00:00.2184004

00:00.2184003

List: Binary Search

00:00.0156000

00:00.0156000

00:00.0156000

Sorted List

00:00.0156000

00:00.0156000

00:00.0156001

Dictionary (by value)

00:00.2496004

00:00.2652004

00:00.2652005

Sorted Dictionary (by value)

00:00.9672017

00:00.7488013

00:00.7332013

Concurrent Dictionary (by value)

00:01.0140018

00:01.0296018

00:01.0140018

HashTable (by value)

00:00.2964005

00:00.2808005

00:00.2964006

 

 

——– Parallel “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00

Sorted Dictionary (by key)

00:00

00:00

00:00

Concurrent Dictionary (by key)

00:00

00:00

00:00

HashSet

00:00

00:00

00:00

HashTable (by key)

00:00

00:00

00:00

 

 

——– Parallel “Value” Search: ——–

Array

00:00.3900007

00:00.3744006

00:00.3900006

Array – Custom method

00:00.2340004

00:00.2184004

00:00.2340004

Array: Binary Search

00:00

00:00

00:00

Array List – Contains

00:00.0312000

00:00.0624001

00:00.0780001

Array List – For Loop

00:00.0156000

00:00.0468001

00:00.0468001

List – Contains

00:00.0780002

00:00.0780001

00:00.0780001

List – For Loop

00:00.0468001

00:00.0624001

00:00.0780001

List: Binary Search

00:00

00:00

00:00

Sorted List

00:00

00:00

00:00

Dictionary (by value)

00:00.0936002

00:00.0936002

00:00.0936001

Sorted Dictionary (by value)

00:00.1560002

00:00.3588006

00:00.1872003

Concurrent Dictionary (by value)

00:01.2012021

00:01.2168021

00:01.2168021

HashTable (by value)

00:00.0936001

00:00.0936002

00:00.0936001

 

 

SearchPerformance:

# to Search:

25,000

String Lengths:

12

50

128

——– Single Threaded “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00.0156000

Sorted Dictionary (by key)

00:00.0624001

00:000.624001

00:00.0780001

Concurrent Dictionary (by key)

00:00

00:00

00:00.0156000

HashSet

00:00

00:00

00:00.0156001

HashTable (by key)

00:00

00:00

00:00.0156001

 

 

——– Single Threaded “Value” Search: ——–

Array

00:36.9408649

00:39.0936686

00:42.9312754

Array custom method

00:20.5452361

00:22.5576396

00:21.0600370

Array: Binary Search

00:00.0780001

00:00.0624001

00:00.0624001

Array List – Contains

00:04.3524076

00:07.9248139

00:07.9248139

Array List – For Loop

00:02.3712042

00:02.3712042

00:02.3712042

List – Contains

00:05.5068096

00:08.2680146

00:10.1244177

List – For Loop

00:04.7112083

00:06.4896114

00:07.0356124

List: Binary Search

00:00.0624001

00:00.0480002

00:00.0468000

Sorted List

00:00.0468000

00:00.0468001

00:00.0624001

Dictionary (by value)

00:05.9592105

00:09.5160167

00:10.9824193

Sorted Dictionary (by value)

00:18.0492318

00:22.8228401

00:25.7868453

Concurrent Dictionary (by value)

00:36.2700637

00:39.9984703

00:47.2368830

HashTable (by value)

00:07.3788129

00:07.6440134

00:07.7688136

 

 

——– Parallel “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00

Sorted Dictionary (by key)

00:00.0312000

00:00.0312001

00:00.0312000

Concurrent Dictionary (by key)

00:00

00:00

00:00

HashSet

00:00

00:00

00:00

HashTable (by key)

00:00

00:00

00:00

 

 

——– Parallel “Value” Search: ——–

Array

00:09.7188171

00:09.9060174

00:09.6876170

Array – Custom method

00:05.1948091

00:05.2884093

00:05.3508094

Array: Binary Search

00:00.0312000

00:00.0312000

00:00.0156000

Array List – Contains

00:01.0296018

00:01.3728024

00:01.4976026

Array List – For Loop

00:00.6864012

00:00.6708012

00:00.6552011

List – Contains

00:01.4196025

00:01.6848029

00:01.4976026

List – For Loop

00:00.9516016

00:01.2012021

00:00.6552011

List: Binary Search

00:00.0312001

00:00.0312001

00:00.0156000

Sorted List

00:00.0156000

00:00.0156001

00:00.0156000

Dictionary (by value)

00:01.4508026

00:01.8096032

00:02.1216037

Sorted Dictionary (by value)

00:03.7752066

00:04.1184073

00:04.2588075

Concurrent Dictionary (by value)

00:44.5848783

00:47.0496827

00:51.3552902

HashTable (by value)

00:01.7472030

00:01.7940032

00:01.9188034

 

 

SearchPerformance:

# to Search:

100,000

String Lengths:

12

50

128

——– Single Threaded “Key” Search: ——–

Dictionary (by key)

00:00.0312001

00:00.0468001

00:00.0468001

Sorted Dictionary (by key)

00:00.2184004

00:00.2340004

00:00.2496004

Concurrent Dictionary (by key)

00:00.0312000

00:00.0312001

00:00.0468001

HashSet

00:00.1560000

00:00.0312001

00:00.0468000

HashTable (by key)

00:00.0312000

00:00.0156000

00:00.0468001

 

 

——– Single Threaded “Value” Search: ——–

Array

11:36.0108225

13:46.5674518

17:01.7081945

Array custom method

08:21.6500811

10:02.8786589

11:23.7336009

Array: Binary Search

00:00.1872003

00:00.2184004

00:00.2340004

Array List – Contains

02:59.1447150

03:46.2315974

04:16.9324513

Array List – For Loop

00:37.5024659

00:37.3932657

00:37.3308655

List – Contains

03:28.6347665

04:27.4780698

05:02.8277319

List – For Loop

02:57.1071110

03:37.6359822

05:01.3457293

List: Binary Search

00:00.1872003

00:00.2028004

00:00.2340004

Sorted List

00:00.1716003

00:00.1872003

00:00.1716003

Dictionary (by value)

03:34.7031772

04:34.3576819

05:11.0489463

Sorted Dictionary (by value)

18:41.6731701

20:04.8681162

20:40.2333784

Concurrent Dictionary (by value)

21:37.0018781

22:53.0208115

24:20.6929656

HashTable (by value)

05:59.2842311

06:07.3494452

06:13.7766565

 

 

——– Parallel “Key” Search: ——–

Dictionary (by key)

00:00

00:00

00:00.0156001

Sorted Dictionary (by key)

00:00.0780001

00:00.0780001

00:00.0780002

Concurrent Dictionary (by key)

00:00

00:00.0156001

00:00.0156000

HashSet

00:00

00:00

00:00.0156000

HashTable (by key)

00:00

00:00

00:00.0156000

 

 

——– Parallel “Value” Search: ——–

Array

03:14.2047411

03:32.1291726

03:32.5035732

Array – Custom method

01:48.4669905

02:00.3542114

01:49.4809923

Array: Binary Search

00:00.0780001

00:00.0780001

00:00.0780001

Array List – Contains

00:46.6596819

00:55.8168981

00:57.7201014

Array List – For Loop

00:11.5752203

00:11.5284203

00:11.3412199

List – Contains

00:51.8232910

01:06.3625165

01:14.2405304

List – For Loop

00:43.5396765

00:52.3692920

00:47.2056830

List: Binary Search

00:00.0624001

00:00.0780001

00:00.0780001

Sorted List

00:00.0780001

00:00.0624001

00:00.0624001

Dictionary (by value)

00:56.6748995

01:17.2045356

01:23.1793460

Sorted Dictionary (by value)

02:53.8779054

03:26.0919620

03:32.7843738

Concurrent Dictionary (by value)

13:52.1210615

14:07.0034877

15:20.4172166

HashTable (by value)

01:17.7817367

01:20.4337412

01:22.4773449

 

The Results:

When it came to searching 100 strings (either as keys or values), most of the techniques registered 00:00 time.

However, once I started increasing the number of strings to search, the results became a lot more interesting.

For starters, the SortedDictionary performed miserably compared to the other collections that allow for key lookups; even worse when searching its values. All the other collections that allowed for key lookups performed admirably.

The collections where we could only search for strings as part of the “value” had wide and varied performances. The top 3 performers were the SortedList, and doing binary searches on both a regular List and an array.

None of the collection types that implement the use of a “key” performed well when searching their “values”.

Similarly, anything that implemented LINQ using Contains was slow. (No surprises there! There should be a setting in Visual Studio to banish LINQ when needing to develop high speed code.)

 

In Summary:

On my system, unless someone spots a flaw in my test code, if you are doing just a few calls or searches here and there, almost anything should suit your needs and perform fine.

 

However, if you have a “need for speed” or are doing a heavy amount of query lookups, then if you need to search for a collection for a string:

  1. in the “key”, A Dictionary, ConcurrentDictionary, or Hashset are your best bets.
  2. In the “value”, you should use a SortedList, or convert to an array/normal List and do a binary search.

 

By all means, do not use LINQ. The performance difference can be summed up as follows: LINQ is like trying to melt butter with a match; whereas the top performers are akin to melting butter with a blowtorch.

 

The Code:

 


Spread the love

David Lozinski