5

C#: Is It Faster to Preallocate Dictionary Sizes?

C#: Is It Faster to Preallocate Dictionary Sizes?

This test will benchmark populating C# Dictionary and ConcurrentDictionary objects to determine in C#: is it faster to preallocate dictionary sizes? Or not?

When a Dictionary object is created, its initial capacity is set to a system default. As more key/value pairs are added, C# automatically resizes the dictionary to accommodate. While this is great to have, it takes precious CPU cycles to allocate new memory and scale the growing object.

What happens if the maximum number of key/value pairs that are going to be added is already known? The Dictionary and ConcurrentDictionary constructors accept a parameter to set the initial sizing upon creation.

But does this actually save significant amounts of time? Will this micro-optimize code for greater performance?

So that’s when this curious consultant started wondering in C#: is it faster to preallocate dictionary sizes or not? What kind of difference does this make on simple data types like ints and strings?

The Set Up:

I wrote a C# Console application to test the speed differences between using preallocated dictionary sizes and not. For the ConcurrentDictionary types, I tested using both single and parallel loops.

The code is written in Visual Studio 2013 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system.

Here are the Constructors used:


Definition:


Constructor Code Snippets Utilized:


Dictionary<string,string>


Dictionary<string,string>()
Dictionary<string,string>(numberOfItemsToAdd)


Dictionary<int,int>


Dictionary<int,int>()
Dictionary<int,int>(numberOfItemsToAdd)


Dictionary<int,string>


Dictionary<int,string>()
Dictionary<int,string>(numberOfItemsToAdd)


Dictionary<string,int>


Dictionary<string,int>()
Dictionary<string,int>(numberOfItemsToAdd)


Dictionary<Guid,int>


Dictionary<Guid,int>()
Dictionary<Guid,int>(numberOfItemsToAdd)


ConcurrentDictionary<string,string>


ConcurrentDictionary<string,string>()
ConcurrentDictionary<string,string>(iConcurrencyLevel,numberOfItemsToAdd)


ConcurrentDictionary<int,int>


ConcurrentDictionary<int,int>()
ConcurrentDictionary<int,int>(iConcurrencyLevel,numberOfItemsToAdd)


ConcurrentDictionary<int,string>


ConcurrentDictionary<int,string>()
ConcurrentDictionary<int,string>(iConcurrencyLevel,numberOfItemsToAdd)


ConcurrentDictionary<string,int>


ConcurrentDictionary<string,int>()
ConcurrentDictionary<string,int>(iConcurrencyLevel,numberOfItemsToAdd)


ConcurrentDictionary<Guid,int>


ConcurrentDictionary<Guid,int>()
ConcurrentDictionary<Guid,int>(iConcurrencyLevel,numberOfItemsToAdd)

The code assumes:

  1. There are no duplicate keys (to avoid rehashing times)
  2. Usage of the Add and TryAdd methods as opposed to adding directly via index. That is, Dictionary[x] = value.
  3. Just using simple data types Guids, strings, and ints since I believe they are the most common elements stored in dictionary objects.

The exe file was installed and run on an Alienware M17X R3 running Windows 7 64-bit with 16 GB memory on an i7-2820QM processor.

The test was run for each Dictionary over the following number of items:

  • 42,949,672
  • 21,474,836
  • 214,748
  • 2,147

Ready! Set! Go!

Before starting, my hypothesis was that I expected it to be faster to pre-size a Dictionary and ConcurrentDictionary when the size is known.

Let’s see what happened on my machine over multiple runs.

All times are indicated in minutes:seconds.milliseconds format.
Lower numbers indicate faster performance.
Winners marked in green.
“U” indicates dictionary objects whose size is “unallocated”;
“P” indicates dictionary objects whose size was “preallocated” through the appropriate constructor.

BENCHMARK RUN #1

@2,147

@214,748

@21,474,836

@42,949,672

SINGLE THREADED RUNS:

 

 

 

 

U: Dictionary<string,string>

00:00.0002273

00:00.0361263

00:06.0409926

00:14.5659040

P:

00:00.0001438

00:00.0312376

00:04.7349324

00:09.4466955

 

 

 

 

 

U: Dictionary<int, int>

00:00.0002041

00:00.0072876

00:02.6773140

00:08.2096779

P:

00:00.0000714

00:00.0039430

00:00.4244548

00:00.7852797

 

 

 

 

 

U: Dictionary<int,string>

00:00.0002930

00:00.0096544

00:03.0010987

00:03.5117704

P:

00:00.0000375

00:00.0042561

00:00.4504702

00:00.8994929

 

 

 

 

 

U: Dictionary<string,int>

00:00.0003716

00:00.0272347

00:07.2401504

00:14.9067729

P:

00:00.0002188

00:00.0236686

00:04.7413807

00:09.3155774

 

 

 

 

 

U: Dictionary<Guid,int>

00:00.0062369

00:00.0187706

00:04.9121037

00:10.3142391

P:

00:00.0003408

00:00.0143823

00:03.7520816

00:08.3139629

 

 

 

 

 

U: ConcurrentDictionary<string,string>

00:00.0019876

00:00.1438790

01:01.0350905

01:17.3126081

P:

00:00.0004439

00:00.1577400

00:53.7528088

01:15.0452711

 

 

 

 

 

U: ConcurrentDictionary<int, int>

00:00.0247526

00:00.0578819

00:26.1670720

00:15.7318800

P:

00:00.0002210

00:00.0563052

00:19.4801571

00:20.7976547

 

 

 

 

 

U: ConcurrentDictionary<int,string>

00:00.0094614

00:00.0482261

00:32.5955766

00:18.6536895

P:

00:00.0002626

00:00.0475963

00:20.4774946

00:21.5232307

 

 

 

 

 

U: ConcurrentDictionary<string,int>

00:00.0109010

00:00.1134774

00:44.1907482

01:28.5362895

P:

00:00.0004631

00:00.1276015

00:34.4437704

01:27.5317917

 

 

 

 

 

U: ConcurrentDictionary<Guid,int>

00:00.0054150

00:00.1006916

00:27.2203878

01:19.9240067

P:

00:00.0009672

00:00.0934570

00:29.5456891

01:29.7403254

PARALLEL RUNS:

 

 

 

 

U: ConcurrentDictionary<string,string>

00:00.0137560

00:00.1330673

00:42.4167889

01:24.3241402

P:

00:00.0028661

00:00.1476621

00:38.6163251

01:49.4208146

 

 

 

 

 

U: ConcurrentDictionary<int, int>

00:00.0010232

00:00.1116703

00:16.9300494

00:26.4421446

P:

00:00.0002809

00:00.1539309

00:18.0051088

00:28.2439416

 

 

 

 

 

U: ConcurrentDictionary<int,string>

00:00.0010009

00:00.0689625

00:15.4101132

00:22.9102939

P:

00:00.0005123

00:00.0875107

00:16.3503571

00:26.0862159

 

 

 

 

 

U: ConcurrentDictionary<string,int>

00:00.0406893

00:00.1122130

00:34.8995581

01:19.5162128

P:

00:00.0009393

00:00.1962715

00:34.7003064

01:30.2331389

 

 

 

 

 

U: ConcurrentDictionary<Guid,int>

00:00.0130143

00:00.0924089

00:28.2361962

01:37.8934092

P:

00:00.0040962

00:00.1489778

00:30.2954604

01:38.7971114

 

BENCHMARK RUN #2

@2,147

@214,748

@21,474,836

@42,949,672

SINGLE THREADED RUNS:

 

 

 

 

U: Dictionary<string,string>

00:00.0002246

00:00.0368734

00:06.1298436

00:14.3620488

P:

00:00.0005824

00:00.0343868

00:04.7487702

00:09.5577737

 

 

 

 

 

U: Dictionary<int, int>

00:00.0005194

00:00.0285463

00:02.6909753

00:08.2321708

P:

00:00.0001438

00:00.0157965

00:00.4089019

00:01.2366061

 

 

 

 

 

U: Dictionary<int,string>

00:00.0005239

00:00.0343824

00:02.9518497

00:03.4986124

P:

00:00.0001509

00:00.0180981

00:00.4587924

00:00.9542032

 

 

 

 

 

U: Dictionary<string,int>

00:00.0010148

00:00.0516690

00:07.3156187

00:14.9102792

P:

00:00.0005940

00:00.0585386

00:04.8517893

00:09.4417711

 

 

 

 

 

U: Dictionary<Guid,int>

00:00.0094231

00:00.0208728

00:04.8564556

00:10.3336003

P:

00:00.0001513

00:00.0144695

00:03.7871221

00:08.1134524

 

 

 

 

 

U: ConcurrentDictionary<string,string>

00:00.0021180

00:00.1658501

01:02.1829562

01:12.0873516

P:

00:00.0012461

00:00.1785896

00:53.5872482

01:14.8892309

 

 

 

 

 

U: ConcurrentDictionary<int, int>

00:00.0346361

00:00.0844247

00:36.7851791

00:29.4151413

P:

00:00.0006802

00:00.1117535

00:29.4077420

00:22.6934690

 

 

 

 

 

U: ConcurrentDictionary<int,string>

00:00.0337941

00:00.0738138

00:37.2342303

00:28.7153391

P:

00:00.0007865

00:00.1125083

00:24.1407912

00:24.0398419

 

 

 

 

 

U: ConcurrentDictionary<string,int>

00:00.0449244

00:00.1344482

00:41.2863518

01:25.7849008

P:

00:00.0012050

00:00.1089779

00:44.3634281

01:27.0357715

 

 

 

 

 

U: ConcurrentDictionary<Guid,int>

00:00.0028313

00:00.1041562

00:27.5649019

01:21.9297903

P:

00:00.0007911

00:00.0929417

00:27.8663068

01:29.9350963

PARALLEL RUNS:

 

 

 

 

U: ConcurrentDictionary<string,string>

00:00.0173839

00:00.1949878

00:41.9971306

01:28.7492204

P:

00:00.0009111

00:00.3810229

00:46.8335379

01:36.2770372

 

 

 

 

 

U: ConcurrentDictionary<int,int>

00:00.0123210

00:00.1447847

00:19.2352523

00:31.4598567

P:

00:00.0014105

00:00.1847258

00:17.6200736

00:29.3366795

 

 

 

 

 

U: ConcurrentDictionary<int,string>

00:00.0095562

00:00.1152861

00:14.2139788

00:25.6892789

P:

00:00.0008151

00:00.1441474

00:14.4854650

00:23.7325534

 

 

 

 

 

U: ConcurrentDictionary<string,int>

00:00.0065180

00:00.1597130

00:34.9380627

01:23.4375095

P:

00:00.0011474

00:00.1814429

00:42.8966195

01:08.1270493

 

 

 

 

 

U: ConcurrentDictionary<Guid,int>

00:00.0164657

00:00.1037966

00:29.2891880

01:25.1618407

P:

00:00.0016523

00:00.1779831

00:32.1869650

01:19.9387991

Well, what an interesting set of results!

For regular Dictionary objects, there weren’t any surprises. The preallocated dictionaries ran faster across the board. When using ints as keys the preallocated dictionaries ran significantly faster (by a factor of 10) than their non preallocated counterparts!

Now the ConcurrentDictonaries… I don’t even know where to begin. What a mixed bag!

  • The “U” ConcurrentDictionary<string,string> took 10x longer to run than the “U” Dictionary<string,string>. 10x longer!
  • The Parallel runs didn’t save the ConcurrentDictionary<string,string> speeds. For example, in the case of 42,949,672 both the “U” and “P” concurrentdictionaries took well over a minute longer than the regular dictionary!
  • When preallocating the sizes of the concurrentdictionaries, the parallel loops typically took longer to populate than the non-preallocated concurrentdictionaries. But in a single-threaded loop they didn’t. What the heck?!

Coding techniques to be reconsidered.

On my system, unless someone spots a flaw in my test code, I’ll have to reconsider how to do things.

For single-thread usage, no doubt that preallocating Dictionary objects yields the fastest performance.

If I’m going to work with large data sets that need storing in a dictionary object of sorts that are only written to once and read thereafter, the results above would indicate that Dictionary objects rather than ConcurrentDictionary objects are better suited, even when multi-threaded.

I’d really love it too if someone could explain to me why Parallel loops took longer to add items to ConcurrentDictionary objects with their size preallocated than ConcurrentDictionary objects whose size wasn’t specified to begin with!

This is totally unacceptable. Surely the geniuses working at Microsoft can figure out a way to get the performance closer to that of a regular dictionary when adding items?

Obviously results may vary, and you should test on your system before micro-optimizing this functionality in your C# application.

As promised, here’s the code for you. Happy benchmarking! 🙂

The Code:

(Visited 557 times, 1 visits today)
  • Erik

    Great post!

    You should do a GC.Collect(2, GCCollectionMode.Forced, true); right after your dict.Clear() to ensure that a garbage collection belonging to the previous test do not occur while performing the next test that mess up the results.

    • FireMystdl

      Fair point. 🙂 I’ll update the code with your suggestion.

  • Paulo Zemek

    Your preallocated values aren’t primes. This is making the preallocated results much worse than they should be when hashcodes aren’t sequential. This is probably why in some cases the preallocated was even slower for the ConcurrentDictionary. The non-preallocated lost time resizing, but always used prime values as the capacity, avoiding too many collisions, while the preallocated didn’t lose time with resizes, but had too many collisions. Use a prime bigger than the amount of items you want to add and test again. I am sure the results will improve. Surely the ConcurrentDictionary will still be slower than a normal dictionary when running from a single thread, but it will have a much better performance by avoiding bucket collisions.

    • Dave

      What does a preallocation with a prime number have to do with it? Are you saying if I have 13 items and preallocate a Dictionary size at 14,15, or 16 it will perform worse than if I preallocate one at size 13 or 17?

      • Floris Robbemont

        When creating a dictionary with a capacity, the code will automatically find the nearest larger prime number when it initializes:

        private void Initialize(int capacity)
        {
        int prime = HashHelpers.GetPrime(capacity);
        this.buckets = new int[prime];
        for (int index = 0; index < this.buckets.Length; ++index)
        this.buckets[index] = -1;

        this.entries = new Dictionary.Entry[prime];
        this.freeList = -1;
        }

        This code is from the .NET source.