3

C# .Net: Fastest Way to Clear Collections

C# .Net: Fastest Way to clear Collections

In this article I’ll investigate different ways to clear out commonly used Collections such as Hashsets, Dictionaries, ConcurrentDictionaries, ArrayLists, and Arrays, benchmarking the results. This will determine in C# .Net: Fastest way to clear Collections.

In almost every major C# application, there’s at least one of the following Collection Types used:

  • Array (A)
  • ArrayList (AL)
  • ConcurrentDictionary (CD)
  • Dictionary (D)
  • Hashset (H)

The majority of the programmers behind said applications tend to be “lazy”. That is, not explicitly cleaning up and clearing out their data when finished. Rather, they leave it to the internals of the C# runtime to do so.

That’s when this curious consultant started wondering in C# .Net: what is the fastest way to clear Collections while minimizing overall performance impact? Is it using their .Clear() methods? Iterating over each element setting it to null? Another way?

 

The Set Up:

I wrote a C# Console application to test numerous techniques.

The code is written in Visual Studio 2013 targeting .Net Framework version 4.5 x64. The source code is available at the end of this blog so you can benchmark it on your own system if you wish.

In a nutshell, here are the methods:

#

Technique

Code Snippet

A1

Set each array element = null

A2

Array.Clear

A3

Set each array element = null using Parallel.For

AL1

Remove each ArrayList item

AL2

ArrayList.Clear()

AL3

Remove each ArrayList item using Parallel.For

CD1

Set each item in ConcurrentDictionary = null

CD2

ConcurrentDictionary.Clear()

CD3

Set each item in ConcurrentDictionary = null using Parallel.For

D1

Set each item in Dictionary = null

D2

Dictionary.Clear()

D3

Set each item in Dictionary = null using Parallel.For

H1

Remove each item from the Hashset

H2

Hashset.Clear()

H3

Remove each item from the Hashset using Parallel.For

The code assumes all numbers are positive integers.

The exe file was installed and run on an Alienware M17X R3 running Windows 7 64-bit with 16 GB memory on an i7-2820QM processor.

The test was run for each technique for 10,737,418 and 2,147,483 integer “objects”.

Let the Benchmarks… BEGIN!

Before starting, my hypothesis was that I expected the .Clear() methods to be the fastest. They’re certainly the easiest to use, only needing one line of code.

Let’s see what happened on my machine over multiple runs.

All times are indicated in minutes:seconds.milliseconds format. Lower numbers indicate faster performance.

Winner marked in green; there are no points for second place.

BENCHMARK RUN #1

@ 10,737,418

@ 2,147,483

A1: o[x] = null

00:00.0320018

00:00.0050003

A2: Array.Clear()

00:00.0050003

00:00.0010000

A3: o[x] = null w/ Parallel.For

00:00.0310017

00:00.0060004

 

AL1: al1.Remove()

– too long –

– too long –

AL2: ArrayList.Clear()

00:00.0070004

00:00.0020001

AL3: al1.Remove() w/ Parallel.For

– too long –

– too long –

 

CD1: cd1[x] = null

00:00.6440368

00:00.1270073

CD2: ConcurrentDictionary.Clear()

00:00.2070118

00:00.0470027

CD3: cd1[x] = null w/ Parallel.For

00:00.1880108

00:00.0440025

 

D1: d1[x] = null

00:00.2160124

00:00.0420024

D2: Dictionary.Clear()

00:00.0430025

00:00.0080004

D3: d1[x] = null w/ Parallel.For

00:00.8700497

00:00.1870107

 

H1: h1.Remove()

00:00.4250243

00:00.0870050

H2: Hashset.Clear()

00:00.0240014

00:00.0050003

H3: h1.Remove() w/ Parallel.For

00:01.8021030

00:00.2790159

 

BENCHMARK RUN #2

@ 10,737,418

@ 2,147,483

A1: o[x] = null

00:00.0370013

00:00.0050002

A2: Array.Clear()

00:00.0050001

00:00.0010000

A3: o[x] = null w/ Parallel.For

00:00.0320010

00:00.0060009

 

AL1: al1.Remove()

– too long –

– too long –

AL2: ArrayList.Clear()

00:00.0070006

00:00.0020005

AL3: al1.Remove() w/ Parallel.For

– too long –

– too long –

 

CD1: cd1[x] = null

00:00.6451269

00:00.1260902

CD2: ConcurrentDictionary.Clear()

00:00.2108122

00:00.0469031

CD3: cd1[x] = null w/ Parallel.For

00:00.1854128

00:00.0420175

 

D1: d1[x] = null

00:00.2160572

00:00.0430008

D2: Dictionary.Clear()

00:00.0440257

00:00.0070009

D3: d1[x] = null w/ Parallel.For

00:00.8809255

00:00.1902134

 

H1: h1.Remove()

00:00.4260029

00:00.0830387

H2: Hashset.Clear()

00:00.0240008

00:00.0050002

H3: h1.Remove() w/ Parallel.For

00:01.7945270

00:00.2794756

 

Only One Surprise in the Results

Well, everything performed as expected except for the ConcurrentDictionary object. For whatever reason, the native .Clear() method ran slower than a Parallel.For loop with locking! Obviously in the underlying native code Microsoft didn’t implement parallelism. Even I could work for Microsoft by having such simple code run faster than their native methods. 😉

Otherwise, no tears were shed for any of the other results. Using the native .Clear() methods are not only the fastest, but certainly the simplest code wise.

Obviously results may vary, and you should test on your system before micro-optimizing this functionality in your C# .Net application.

 

To All the “Lazy” Programmers

Really… you have no excuse now to add one line of code to clear these collections when you’re done using it.

The code below so you can run your own benchmarks to see I’m not making this stuff up. 🙂

 

The Code:

 

  • Jim

    Love your site! Interesting stuff! I’m still a bit of a newb but have a question.

    Should you use the Dispose() call? Or Clear()? Or both (Clear and then Dispose?)

    Or do you use them differently depending on what you want to do next?
    Dispose if you are done and Clear if you want to clear yet reuse the same object?

    • DM

      Basically I use them the way you mentioned in your last question — Clear() first if I plan to reuse the object; otherwise I Clear() first to make sure all the references contained by the objects within are cleared out, then Dispose() to discard any resources the collection object itself may have a hold of.

  • Sabacc

    ConcurrentDictionary has only so slow Clear() performance because being created with an insane concurrencyLevel value (first parameter, which is set to MAX). With a reasonable value, Clear() actually is pretty damn fast.