6

SQL: Fastest way to compare multiple column values

Fastest way to compare multiple column values

I was recently working on a project with stored procedures that had a significant amount of column comparisons in a MERGE statement. This was turning into a real performance bottleneck (relatively speaking) as the entire row of data had to be updated if any one of the numerous fields were modified.

Here’s the same code. Pay particular attention to all the comparisons that were being performed:

Seeing those column comparisons, that’s when this curious consultant started wondering what’s the fastest way to compare multiple column values?

The Alternatives

The research began, and besides the technique above (which is probably the most common as it’s pretty straight forward), here are a few other ways to do the same thing:

#

What To Do

Sample Code Snippet

1

Nothing. Leave code as is.

 

2

Alter the tables to add a BINARY_CHECKSUM column that incorporates all the table’s columns; then compare the checksum columns

ADD the following column definition to the tables:

3

Implement the STUFF method

4

Use CONCAT to concatenate all the values into a single value for comparison

5

Use HASHBYTES with CONCAT to hash all values into a single value

6

Perform an on-the-fly BINARY_CHECKSUM comparison. This is case-sensitive.

7

Perform an on-the-fly normal CHECKSUM. This is case insensitive.

Setting Things Up for the Speed Test

For testing purposes, the SQL code was executed on a Windows 2012 Server with 128GB memory, 16 core CPU rated at 2.54 Ghz, using Microsoft SQL Server 2014.

To ensure SQL Server didn’t keep any queries (or anything for that matter) cached, the following code was run before each test:

Two tables are created, and populated with 20 million rows, using a subset of columns from an actual table that has over 100 million records. The subset of columns has the same schema as that of the original table. The primary keys are the same.

Here is the code that’s common across every test:

Here are the full code snippets that are in the WHERE clause’s [ PLACEHOLDER ] section for each technique:

#

Code

1

2

3

4

5

6

7

The Results Are In!

Here are the results, in seconds, as to how long each technique took to execute.

The winner is highlighted in green.

#

Time to execute, in seconds, over 3 runs:

1: Original

95, 96, 95

2: Checksum Column

84, 84, 84

3: STUFF

247, 240, 244

4: CONCAT

157, 157, 155

5: HASHBYTES

254, 253, 253

6: BINARY_CHEKSUM

84, 84, 84

7: CHECKSUM

88, 88, 87

The Way To Go

I didn’t know what to expect when running the tests, so made no guesses as to which way might run the fastest.

I was surprised by two things:

  1. The Checksum column and the BINARY_CHECKSUM took the same time to complete
  2. Both of the above ran in the same amount of time across 3 individual runs. There was no fluctuation.

For me, the way to go will be using BINARY_CHECKSUM in the stored procedures (which are being developed) because we can’t modify the 500+ tables which might need a checksum column.

As for everyone else, I’ve left you the SQL code above, so feel free to use it as a basis for conducting your own performance benchmarks.

If anyone else has any tricks or techniques for comparing multiple columns quickly in a MERGE or WHERE clause, definitely leave a comment below and share the knowledge! 🙂

 

(Visited 2,153 times, 2 visits today)
  • Norman

    And what about “except solution”? Except is the easiest to write but it is quite slow (from my test). Maybe there is a set of attribute type that will be faster than classic “where solution” (number 1 from your solution)

    Be aware also that coalesce/isnull of null date will be valid date – 1900-01-01 (on date type)
    Try this:
    declare @null_date date = null, @valid_date date = ‘19000101’
    select @null_date, @valid_date where coalesce(@null_date,”) = coalesce(@valid_date,”)

    In some databases date 1900-01-01 means something so it may be a hard to find “feature”.

  • Marty

    Very good article and informative. Thanks!
    Plus good point by Spyder to consider…

    • VoiceOver

      It’s not a good point, it’s a deadly point which renders BINARY_CHECKSUM unusable

      • echovault

        BINARYCHECKSUM is still usable for example if all the columns in question are ints and varchars

  • Spyder

    Be careful about using checksums to compare.

    Consider this:

    SELECT BINARY_CHECKSUM(‘113182′,”,”,”,”,”,”,”,’0’)
    RESULT 52560514

    SELECT BINARY_CHECKSUM(‘113282′,”,”,”,”,”,”,’0′,’0’)
    RESULT 52560514

    • VoiceOver

      Exactly, binary_checksum and checksum are completely unusable as they are incorrect calculations. Even worse:
      select BINARY_CHECKSUM(360.000)
      RESULT 1023732201
      select BINARY_CHECKSUM(36000.0)
      RESULT 1023732201

      Really a shame as it is a fast method, ah well…, nice try Microsoft