8

Optimizing SQL Strategies

Spread the love

Optimizing SQL Strategies

How many of you have come across SQL code that, for whatever reason, wasn’t written as good as it could have been? I’m not talking about going into the inner workings of your database’s engine and milking every possibly nanosecond off the execution plan.

 

I’m talking about code where you see something more like the following. This code represents about half the update statements (the other half left out for brevity). The stg1Table table being updated had several million records with no indexes. Can anyone guess before reading any further why my jaw dropped when I saw this?

I constantly come across SQL code where performance is obviously not considered. So that’s when this curious consultant decided to document some basic Optimization SQL Strategies in hopes of helping some SQL coders “lift their game”. 🙂

With that, here we go in no particular order:

 

Whenever possible, DELETE first, UPDATE second, INSERT third

I came across a stored procedure that was operating on a table with approximately 26 columns and several million records. There was no index on the table other than the primary key which was just an IDENTITY column. The stored procedure was run approximately once per week. With each run, here’s what it was coded to do:

  1. INSERT several thousand records marking them as “new”.
  2. DELETE “expired” records
  3. UPDATE “old” records

The records were mutually exclusive of each other. That is, any records that were “new” wouldn’t be affected by the UPDATE or DELETE operation.

Ask yourself why this isn’t the best approach. Where is the inefficiency? Answer: in the order of operations.

By INSERTing several thousand new records first, that means the database engine has to search through that many more records to find those to DELETE and UPDATE later. If the INSERT were performed last, that obviously saves the database engine from having to scan through all the potential new records that are being INSERTed.

Therefore, you should always DELETE first whenever possible as this will present the database engine with less records to search when performing subsequent UPDATEs.

Next, UPDATE current records before performing an INSERT whenever possible because again, that’s less records the database will have to scan through.

Finally, perform your INSERT last whenever possible.

 

Use @TableVars for small amounts of table data instead of #temp tables

They’re small. Fast. Require zero disk I/O.

Here’s the perfect example I came across in numerous stored procedures for one client:

I promptly modified the code as follows, which actually increased the overall stored procedure runtime performance:

 

Eliminate Joins On Massive Tables When Most Columns Aren’t Used

I was modifying a stored procedure which was updating fact tables in a data warehouse in Database1. This stored procedure had several joins to a significantly larger, staging table (3 dozen columns, several million records, no indexes on the columns being joined on or selected) in Database2 on the same server.

In this scenario, I managed to shave several minutes off the stored procedure runtime by creating a smaller #temptable from the joined table in Database2, using only the 3 columns that were utilized. I then created a CLUSTERED index on the one column in the #temptable that was contained in the WHERE clause.

Here’s the old code:

The new code thus becomes:

See what I’ve done? It made a HUGE difference in the run time!

As an exercise to the reader, you can get the SQL code to execute faster if you add a WHERE clause to the SELECT statement when creating the #tempTable.

 

If a #temptable is used once, consider putting it in a join in the query to increase parallelism

Conversely, I’ve found #temptables in code where it hasn’t been advantageous. More disk I/O and resources were wasted, as well as limiting the database server’s ability to implement parallelism.

Here’s an example of code I rewrote:

 

Indexes on #temptables

This trick has to be determined on a case-by-case basis because basically you have to weigh up the time difference between:

  1. Creating and loading the #temptable; then performing the queries AND
  2. Creating and loading the #temptable; creating the index on the necessary columns; then performing the queries.

Sometimes the time and associated costs for the server to perform #1 is practically the same as performing #2. This especially happens when the #temptable as a relatively low number of records it takes just as much time to scan the table as it would to create then scan the index for the values.

My general rule of thumb for considering whether or not to do this is:

  1. The #temptable has at least 100,000 records in it
  2. The #temptable is in several JOINs.

 

Use LIKE instead of LEFT() or SUBSTRING() in WHERE clauses when Indexes are present

Unless absolutely necessary. Why? LEFT, SUBSTRING, and similar functions are just that – functions. They do not allow the database engine to take advantage of indexes.

As a hint, if you’re going for your SQL Server MCSE Data Platform certification, questions similar to the above scenario were on the SQL Server 2012 exams, so you should know this. 🙂

 

Don’t Sort Data in a Table Contrary to a Clustered Index

Obviously pay this no mind if the clustered index is solely on an identity column. Otherwise, it generally just wastes time.

Let’s say you have two columns in a table forming a clustered index as follows:

And if you’re lucky like me (let me roll my eyes for you at that sarcasm), you find an insert statement similar to the following:

 

Filter by Subqueries in JOINs instead of WHERE clauses

The majority of the time, filtering data by using a subquery in the WHERE clause is slower than filtering by JOINing to a subquery. I believe this is because an SQL server engine can optimize and implement parallelism within JOIN clauses more efficiently than in WHERE clauses.

 

Combine multiple UPDATE statements

From my opening example. Yes, I seriously came across this production code:

Rewritten by doing some simple optimization SQL magic:

‘nuf said. 🙂

I hope you enjoyed this article!

Do you have any SQL optimizing tips and techniques? If so, leave a comment and share with your fellow readers! Let’s make the SQL world a better, happier place for everyone. 🙂


Spread the love

David Lozinski