8

SQL: Fastest way to compare multiple column values

Spread the love

Fastest way to compare multiple column values

I was recently working on a project with stored procedures that had a significant amount of column comparisons in a MERGE statement. This was turning into a real performance bottleneck (relatively speaking) as the entire row of data had to be updated if any one of the numerous fields were modified.

Here’s the same code. Pay particular attention to all the comparisons that were being performed:

Seeing those column comparisons, that’s when this curious consultant started wondering what’s the fastest way to compare multiple column values?

 

The Alternatives

The research began, and besides the technique above (which is probably the most common as it’s pretty straight forward), here are a few other ways to do the same thing:

#

What To Do

Sample Code Snippet

1

Nothing. Leave code as is.

 

2

Alter the tables to add a BINARY_CHECKSUM column that incorporates all the table’s columns; then compare the checksum columns

ADD the following column definition to the tables:

3

Implement the STUFF method

4

Use CONCAT to concatenate all the values into a single value for comparison

5

Use HASHBYTES with CONCAT to hash all values into a single value

6

Perform an on-the-fly BINARY_CHECKSUM comparison. This is case-sensitive.

7

Perform an on-the-fly normal CHECKSUM. This is case insensitive.

 

Setting Things Up for the Speed Test

For testing purposes, the SQL code was executed on a Windows 2012 Server with 128GB memory, 16 core CPU rated at 2.54 Ghz, using Microsoft SQL Server 2014.

To ensure SQL Server didn’t keep any queries (or anything for that matter) cached, the following code was run before each test:

 

Two tables are created, and populated with 20 million rows, using a subset of columns from an actual table that has over 100 million records. The subset of columns has the same schema as that of the original table. The primary keys are the same.

 

Here is the code that’s common across every test:

 

Here are the full code snippets that are in the WHERE clause’s [ PLACEHOLDER ] section for each technique:

#

Code

1

2

3

4

5

6

7

 

The Results Are In!

Here are the results, in seconds, as to how long each technique took to execute.

The winner is highlighted in green.

#

Time to execute, in seconds, over 3 runs:

1: Original

95, 96, 95

2: Checksum Column

84, 84, 84

3: STUFF

247, 240, 244

4: CONCAT

157, 157, 155

5: HASHBYTES

254, 253, 253

6: BINARY_CHEKSUM

84, 84, 84

7: CHECKSUM

88, 88, 87

 

The Way To Go

I didn’t know what to expect when running the tests, so made no guesses as to which way might run the fastest.

I was surprised by two things:

  1. The Checksum column and the BINARY_CHECKSUM took the same time to complete
  2. Both of the above ran in the same amount of time across 3 individual runs. There was no fluctuation.

 

For me, the way to go will be using BINARY_CHECKSUM in the stored procedures (which are being developed) because we can’t modify the 500+ tables which might need a checksum column.

As for everyone else, I’ve left you the SQL code above, so feel free to use it as a basis for conducting your own performance benchmarks.

If anyone else has any tricks or techniques for comparing multiple columns quickly in a MERGE or WHERE clause, definitely leave a comment below and share the knowledge! 🙂

 


Spread the love

David Lozinski