Spread the love

Optimizing SQL Strategies

How many of you have come across SQL code that, for whatever reason, wasn’t written as good as it could have been? I’m not talking about going into the inner workings of your database’s engine and milking every possibly nanosecond off the execution plan.

I’m talking about code where you see something more like the following. This code represents about half the update statements (the other half left out for brevity). The stg1Table table being updated had several million records with no indexes. Can anyone guess before reading any further why my jaw dropped when I saw this?

-- beginning of the updates

UPDATE stg1Table
SET    AccType = 'RE'
WHERE CHARINDEX('Romeo', CatchPhrase) > 0

UPDATE stg1Table
SET    AccType = 'TL'
WHERE CHARINDEX('Tango', CatchPhrase) > 0

UPDATE stg1Table
SET    AccType = 'VA'
WHERE CHARINDEX('Victor', CatchPhrase) > 0

UPDATE stg1Table
SET    AccType = 'WC'
WHERE CHARINDEX('Whiskey', CatchPhrase) > 0

UPDATE Stg1Table
SET    AccType = 'BT'
WHERE CHARINDEX('Bravo', CatchPhrase) > 0

UPDATE Stg1Table
SET   AccType = 'SC'
WHERE CHARINDEX('Sierra', CatchPhrase) > 0

UPDATE Stg1Table
SET    AccType = 'EC'
WHERE CHARINDEX('Echo', CatchPhrase) > 0

 --  : more similar updates omitted for brevity
 --  : This was unbelieveable!

-- beginning of the updates

UPDATE stg1Table

SET AccType = 'RE'

WHERE CHARINDEX('Romeo', CatchPhrase) > 0

UPDATE stg1Table

SET AccType = 'TL'

WHERE CHARINDEX('Tango', CatchPhrase) > 0

UPDATE stg1Table

SET AccType = 'VA'

WHERE CHARINDEX('Victor', CatchPhrase) > 0

UPDATE stg1Table

SET AccType = 'WC'

WHERE CHARINDEX('Whiskey', CatchPhrase) > 0

UPDATE Stg1Table

SET AccType = 'BT'

WHERE CHARINDEX('Bravo', CatchPhrase) > 0

UPDATE Stg1Table

SET AccType = 'SC'

WHERE CHARINDEX('Sierra', CatchPhrase) > 0

UPDATE Stg1Table

SET AccType = 'EC'

WHERE CHARINDEX('Echo', CatchPhrase) > 0

-- : more similar updates omitted for brevity

-- : This was unbelieveable!

I constantly come across SQL code where performance is obviously not considered. So that’s when this curious consultant decided to document some basic Optimization SQL Strategies in hopes of helping some SQL coders “lift their game”. 🙂

With that, here we go in no particular order:

Whenever possible, DELETE first, UPDATE second, INSERT third

I came across a stored procedure that was operating on a table with approximately 26 columns and several million records. There was no index on the table other than the primary key which was just an IDENTITY column. The stored procedure was run approximately once per week. With each run, here’s what it was coded to do:

INSERT several thousand records marking them as “new”.
DELETE “expired” records
UPDATE “old” records

The records were mutually exclusive of each other. That is, any records that were “new” wouldn’t be affected by the UPDATE or DELETE operation.

Ask yourself why this isn’t the best approach. Where is the inefficiency? Answer: in the order of operations.

By INSERTing several thousand new records first, that means the database engine has to search through that many more records to find those to DELETE and UPDATE later. If the INSERT were performed last, that obviously saves the database engine from having to scan through all the potential new records that are being INSERTed.

Therefore, you should always DELETE first whenever possible as this will present the database engine with less records to search when performing subsequent UPDATEs.

Next, UPDATE current records before performing an INSERT whenever possible because again, that’s less records the database will have to scan through.

Finally, perform your INSERT last whenever possible.

Use @TableVars for small amounts of table data instead of #temp tables

They’re small. Fast. Require zero disk I/O.

Here’s the perfect example I came across in numerous stored procedures for one client:

CREATE TABLE #DaysOfWeek
(
	[Day] char(3)
)
INSERT INTO #DaysOfWeek([Day]) VALUES ('Sun'),('Tue'),('Fri')

 -- 
 --  a few queries with joins on #DaysOfWeek
 -- 

DROP TABLE #DaysOfWeek

CREATE TABLE #DaysOfWeek

(

[Day] char(3)

)

INSERT INTO #DaysOfWeek([Day]) VALUES ('Sun'),('Tue'),('Fri')

-- a few queries with joins on #DaysOfWeek

DROP TABLE #DaysOfWeek

I promptly modified the code as follows, which actually increased the overall stored procedure runtime performance:

-- Replaced the #table with @table! Why waste time on
 -- disk I/O for 3 records?? Seriously!
DECLARE @DaysOfWeek TABLE 
(
	[Day] char(3)
)
INSERT INTO @DaysOfWeek([Day]) VALUES ('Sun'),('Tue'),('Fri')

-- Replaced the #table with @table! Why waste time on

-- disk I/O for 3 records?? Seriously!

DECLARE @DaysOfWeek TABLE

(

[Day] char(3)

)

INSERT INTO @DaysOfWeek([Day]) VALUES ('Sun'),('Tue'),('Fri')

Eliminate Joins On Massive Tables When Most Columns Aren’t Used

I was modifying a stored procedure which was updating fact tables in a data warehouse in Database1. This stored procedure had several joins to a significantly larger, staging table (3 dozen columns, several million records, no indexes on the columns being joined on or selected) in Database2 on the same server.

In this scenario, I managed to shave several minutes off the stored procedure runtime by creating a smaller #temptable from the joined table in Database2, using only the 3 columns that were utilized. I then created a CLUSTERED index on the one column in the #temptable that was contained in the WHERE clause.

Here’s the old code:

-- This was the original code.
 -- Several queries similar to this where only 
 -- the 3 columns from [Database2].[dbo].DB2Table1
 -- are used out of the 36.
INSERT INTO OurDataTable
SELECT DISTINCT 
	CAST(J.id as int) as id, 
	cast(J.code as CHAR(1)) as code 
FROM [Database2].[dbo].DB2Table1 AS J –has 36 columns and millions of rows 
LEFT OUTER JOIN [Database1].[dbo].DB1Table1 AS D ON J.id = D.id 
WHERE D.R_Man IS NULL 
	AND J.branch_code NOT in('44','61','63','64')

-- This was the original code.

-- Several queries similar to this where only

-- the 3 columns from [Database2].[dbo].DB2Table1

-- are used out of the 36.

INSERT INTO OurDataTable

SELECT DISTINCT

CAST(J.id as int) as id,

cast(J.code as CHAR(1)) as code

FROM [Database2].[dbo].DB2Table1 AS J –has 36 columns and millions of rows

LEFT OUTER JOIN [Database1].[dbo].DB1Table1 AS D ON J.id = D.id

WHERE D.R_Man IS NULL

AND J.branch_code NOT in('44','61','63','64')

The new code thus becomes:

-- Added for performance gains. 
 -- Only get the 3 columns we're concerned with.
 -- Takes just a few seconds
select 
	CAST(Id as int) as Id, 
	cast(Code as varchar(1)) as Code, 
	cast(Branch_Code as varchar(2)) as Branch_Code
into #tempTable
FROM [Database2].[dbo].DB2Table1 AS J 

 -- Create an index on the one column
 -- that's used in the WHERE clauses
CREATE NONCLUSTERED
	INDEX [UIX_tempTable] ON [#tempTable]([Branch_Code])
	WITH
	FILLFACTOR = 100
	,STATISTICS_NORECOMPUTE
ON [PRIMARY]

 -- Now do what we need to do joining
 -- on the smaller #temp table. 
 -- It runs significantly faster.
INSERT INTO OurDataTable
SELECT DISTINCT 
	CAST(J.id as int) id, 
	cast(J.code as CHAR(1)) as code 
FROM #tempTable AS J -- for performance. Smaller table with only the columns we need + an index
LEFT OUTER JOIN [Database1].[dbo].DB1Table1 AS D ON J.id = D.id 
WHERE D.R_Man IS NULL 
	AND J.branch_code NOT in('44','61','63','64')

-- Added for performance gains.

-- Only get the 3 columns we're concerned with.

-- Takes just a few seconds

select

CAST(Id as int) as Id,

cast(Code as varchar(1)) as Code,

cast(Branch_Code as varchar(2)) as Branch_Code

into #tempTable

FROM [Database2].[dbo].DB2Table1 AS J

-- Create an index on the one column

-- that's used in the WHERE clauses

CREATE NONCLUSTERED

INDEX [UIX_tempTable] ON [#tempTable]([Branch_Code])

WITH

FILLFACTOR = 100

,STATISTICS_NORECOMPUTE

ON [PRIMARY]

-- Now do what we need to do joining

-- on the smaller #temp table.

-- It runs significantly faster.

INSERT INTO OurDataTable

SELECT DISTINCT

CAST(J.id as int) id,

cast(J.code as CHAR(1)) as code

FROM #tempTable AS J -- for performance. Smaller table with only the columns we need + an index

LEFT OUTER JOIN [Database1].[dbo].DB1Table1 AS D ON J.id = D.id

WHERE D.R_Man IS NULL

AND J.branch_code NOT in('44','61','63','64')

See what I’ve done? It made a HUGE difference in the run time!

As an exercise to the reader, you can get the SQL code to execute faster if you add a WHERE clause to the SELECT statement when creating the #tempTable.

If a #temptable is used once, consider putting it in a join in the query to increase parallelism

Conversely, I’ve found #temptables in code where it hasn’t been advantageous. More disk I/O and resources were wasted, as well as limiting the database server’s ability to implement parallelism.

Here’s an example of code I rewrote:

-- They created a #temptable
SELECT ColA, ColB, ColC, ColD, Col4, Col5
INTO #table2
FROM [Database2].[dbo].SomeTable

 -- Lots of SQL code here, then 

 --  a join query. Only place where #table2 is used. 
SELECT a.Col1, a.Col2, a.Col3, b.ColA, b.ColB, b.ColC, b.ColD
FROM Table1 a 
inner join #table2 b on a.Col5 = b.Col5 and a.Col4 = b.Col4
inner join Table3 c on a.Col1 = c.Col1 and a.Col3 = c.Col3

 -- Some more miscellaneous SQL was here

DROP TABLE #table2

 -- -- -- -- -- -- -- -- -- -- -- -- -- -- 

 -- So instead of doing a single query to create #table2, 
 -- which forces the code to execute in serial,
 -- and also wasting time with the associated explicit Disk I/O,
 -- rewrite the query as follows to increase the server's ability
 -- to implement parallelism:
SELECT a.Col1, a.Col2, a.Col3, b.ColA, b.ColB, b.ColC, b.ColD
FROM Table1 a 
inner join 
(
	 -- While the server is performing this query for this join...
	SELECT ColA, ColB, ColC, ColD, Col4, Col5
	FROM [Database2].[dbo].SomeTable
) b on a.Col5 = b.Col5 and a.Col4 = b.Col4
 -- ...it can perform this join between Table1 and Table3
inner join Table3 c on a.Col1 = c.Col1 and a.Col3 = c.Col3

-- They created a #temptable

SELECT ColA, ColB, ColC, ColD, Col4, Col5

INTO #table2

FROM [Database2].[dbo].SomeTable

-- Lots of SQL code here, then

-- a join query. Only place where #table2 is used.

SELECT a.Col1, a.Col2, a.Col3, b.ColA, b.ColB, b.ColC, b.ColD

FROM Table1 a

inner join #table2 b on a.Col5 = b.Col5 and a.Col4 = b.Col4

inner join Table3 c on a.Col1 = c.Col1 and a.Col3 = c.Col3

-- Some more miscellaneous SQL was here

DROP TABLE #table2

-- -- -- -- -- -- -- -- -- -- -- -- -- --

-- So instead of doing a single query to create #table2,

-- which forces the code to execute in serial,

-- and also wasting time with the associated explicit Disk I/O,

-- rewrite the query as follows to increase the server's ability

-- to implement parallelism:

SELECT a.Col1, a.Col2, a.Col3, b.ColA, b.ColB, b.ColC, b.ColD

FROM Table1 a

inner join

(

-- While the server is performing this query for this join...

SELECT ColA, ColB, ColC, ColD, Col4, Col5

FROM [Database2].[dbo].SomeTable

) b on a.Col5 = b.Col5 and a.Col4 = b.Col4

-- ...it can perform this join between Table1 and Table3

inner join Table3 c on a.Col1 = c.Col1 and a.Col3 = c.Col3

Indexes on #temptables

This trick has to be determined on a case-by-case basis because basically you have to weigh up the time difference between:

Creating and loading the #temptable; then performing the queries AND
Creating and loading the #temptable; creating the index on the necessary columns; then performing the queries.

Sometimes the time and associated costs for the server to perform #1 is practically the same as performing #2. This especially happens when the #temptable as a relatively low number of records it takes just as much time to scan the table as it would to create then scan the index for the values.

My general rule of thumb for considering whether or not to do this is:

The #temptable has at least 100,000 records in it
The #temptable is in several JOINs.

Use LIKE instead of LEFT() or SUBSTRING() in WHERE clauses when Indexes are present

Unless absolutely necessary. Why? LEFT, SUBSTRING, and similar functions are just that – functions. They do not allow the database engine to take advantage of indexes.

-- Do not do this if there is an index on Column3
SELECT Column1, Column2
FROM Table1
WHERE LEFT(Column3, 1) = 'a'

 -- Do not do this either if there is an index on Column3
SELECT Column1, Column2
FROM Table1
WHERE SUBSTRING(Column3, 1, 1) = 'a'

 -- Instead, use LIKE
SELECT Column1, Column2
FROM Table1
WHERE Column3 LIKE 'a%'

-- Do not do this if there is an index on Column3

SELECT Column1, Column2

FROM Table1

WHERE LEFT(Column3, 1) = 'a'

-- Do not do this either if there is an index on Column3

SELECT Column1, Column2

FROM Table1

WHERE SUBSTRING(Column3, 1, 1) = 'a'

-- Instead, use LIKE

SELECT Column1, Column2

FROM Table1

WHERE Column3 LIKE 'a%'

As a hint, if you’re going for your SQL Server MCSE Data Platform certification, questions similar to the above scenario were on the SQL Server 2012 exams, so you should know this. 🙂

Don’t Sort Data in a Table Contrary to a Clustered Index

Obviously pay this no mind if the clustered index is solely on an identity column. Otherwise, it generally just wastes time.

Let’s say you have two columns in a table forming a clustered index as follows:

CREATE CLUSTERED INDEX IX_TheIndex 
ON TestTable 
(
      Name Desc,
      DateAdded ASC
)

CREATE CLUSTERED INDEX IX_TheIndex

ON TestTable

(

Name Desc,

DateAdded ASC

)

And if you’re lucky like me (let me roll my eyes for you at that sarcasm), you find an insert statement similar to the following:

INSERT INTO TestTable
SELECT Name, DateAdded, AnotherColumn, AndAnotherColumn, StillAnotherColumn
FROM SomeOtherTable
ORDER BY Name, DateAdded

--The above ORDER BY clause is pointless because the 
--SQL engine could determine, when generating an execution
--plan, that the clustered index could be used in whole or part.
--So give me 3 good reasons why, during the insert, the
--Name and DateAdded columns are sorted in ASC order?

INSERT INTO TestTable

SELECT Name, DateAdded, AnotherColumn, AndAnotherColumn, StillAnotherColumn

FROM SomeOtherTable

ORDER BY Name, DateAdded

--The above ORDER BY clause is pointless because the

--SQL engine could determine, when generating an execution

--plan, that the clustered index could be used in whole or part.

--So give me 3 good reasons why, during the insert, the

--Name and DateAdded columns are sorted in ASC order?

Filter by Subqueries in JOINs instead of WHERE clauses

The majority of the time, filtering data by using a subquery in the WHERE clause is slower than filtering by JOINing to a subquery. I believe this is because an SQL server engine can optimize and implement parallelism within JOIN clauses more efficiently than in WHERE clauses.

--Most of the time, a query structured like this:
SELECT a.Col1, b.Col2
FROM  Table1 a
INNER JOIN Table2 b on a.Col4 = b.Col4
INNER JOIN
(
      select Col3
      from Table3
) c on c.Col3 = a.Col4

--executes faster than a query structured like this:
SELECT  a.Col1, b.Col2
FROM  Table1 a
INNER JOIN Table2 b on a.Col4 = b.Col4
WHERE a.Col4 in
(
      select Col3
      from Table3
)

--Most of the time, a query structured like this:

SELECT a.Col1, b.Col2

FROM Table1 a

INNER JOIN Table2 b on a.Col4 = b.Col4

INNER JOIN

(

select Col3

from Table3

) c on c.Col3 = a.Col4

--executes faster than a query structured like this:

SELECT a.Col1, b.Col2

FROM Table1 a

INNER JOIN Table2 b on a.Col4 = b.Col4

WHERE a.Col4 in

(

select Col3

from Table3

)

Combine multiple UPDATE statements

From my opening example. Yes, I seriously came across this production code:

-- beginning of updates
 -- Each UPDATE requires an entire table scan! 
UPDATE stg1Table
SET AccType = 'RE'
WHERE CHARINDEX('Romeo', CatchPhrase) > 0

UPDATE stg1Table
SET AccType = 'TL'
WHERE CHARINDEX('Tango', CatchPhrase) > 0

UPDATE stg1Table
SET AccType = 'VA'
WHERE CHARINDEX('Victor', CatchPhrase) > 0

UPDATE stg1Table
SET AccType = 'WC'
WHERE CHARINDEX('Whiskey', CatchPhrase) > 0

UPDATE Stg1Table
SET AccType = 'BT'
WHERE CHARINDEX('Bravo', CatchPhrase) > 0

UPDATE Stg1Table
SET AccType = 'SC'
WHERE CHARINDEX('Sierra', CatchPhrase) > 0

UPDATE Stg1Table
SET AccType = 'EC'
WHERE CHARINDEX('Echo', CatchPhrase) > 0
 -- and yet more similar update statements were here

-- beginning of updates

-- Each UPDATE requires an entire table scan!

UPDATE stg1Table

SET AccType = 'RE'

WHERE CHARINDEX('Romeo', CatchPhrase) > 0

UPDATE stg1Table

SET AccType = 'TL'

WHERE CHARINDEX('Tango', CatchPhrase) > 0

UPDATE stg1Table

SET AccType = 'VA'

WHERE CHARINDEX('Victor', CatchPhrase) > 0

UPDATE stg1Table

SET AccType = 'WC'

WHERE CHARINDEX('Whiskey', CatchPhrase) > 0

UPDATE Stg1Table

SET AccType = 'BT'

WHERE CHARINDEX('Bravo', CatchPhrase) > 0

UPDATE Stg1Table

SET AccType = 'SC'

WHERE CHARINDEX('Sierra', CatchPhrase) > 0

UPDATE Stg1Table

SET AccType = 'EC'

WHERE CHARINDEX('Echo', CatchPhrase) > 0

-- and yet more similar update statements were here

Rewritten by doing some simple optimization SQL magic:

-- Rewrite the code; just do the UPDATE once!
UPDATE stg1Table
Set AccType = Case (CatchPhrase)
    When LIKE '%Romeo%' Then 'RE'
    When LIKE '%Tango%' Then 'TL'
    When LIKE '%Victor%' Then 'VA'
    When LIKE '%Whiskey%' Then 'WC'
    When LIKE '%Bravo%' Then 'BT'
    When LIKE '%Sierra%' Then 'SC'
    When LIKE '%Echo%' Then 'EC'
    -- remaining When clauses 
End

-- Rewrite the code; just do the UPDATE once!

UPDATE stg1Table

Set AccType = Case (CatchPhrase)

When LIKE '%Romeo%' Then 'RE'

When LIKE '%Tango%' Then 'TL'

When LIKE '%Victor%' Then 'VA'

When LIKE '%Whiskey%' Then 'WC'

When LIKE '%Bravo%' Then 'BT'

When LIKE '%Sierra%' Then 'SC'

When LIKE '%Echo%' Then 'EC'

-- remaining When clauses

End

‘nuf said. 🙂

I hope you enjoyed this article!

Do you have any SQL optimizing tips and techniques? If so, leave a comment and share with your fellow readers! Let’s make the SQL world a better, happier place for everyone. 🙂

Spread the love

Optimizing SQL Strategies

ByDavid Lozinski

Optimizing SQL Strategies

Whenever possible, DELETE first, UPDATE second, INSERT third

Use @TableVars for small amounts of table data instead of #temp tables

Eliminate Joins On Massive Tables When Most Columns Aren’t Used

If a #temptable is used once, consider putting it in a join in the query to increase parallelism

Indexes on #temptables

Use LIKE instead of LEFT() or SUBSTRING() in WHERE clauses when Indexes are present

Don’t Sort Data in a Table Contrary to a Clustered Index

Filter by Subqueries in JOINs instead of WHERE clauses

Combine multiple UPDATE statements

Related Post

SQL: comparing values IN vs INNER JOIN

SQL: LIKE vs SUBSTRING vs LEFT/RIGHT vs CHARINDEX

SQL: Fastest way to insert new records where one doesn’t already exist

You missed

Change your default DNS for performance and privacy upgrades

Comparing Microsoft Copilot, Copilot M365, and ChatGPT

Dating Multiple AI’s

Choosing CloudFlare over Microsoft Azure