Fastest way to insert new records where one doesn’t already exist
SQL Developers come across this scenario quite often – having to insert records into a table where a record doesn’t already exist.
The age-old technique and I suspect most common practice is doing a left join where the values are null from the table being inserted into. Even though this is simple, is it the fastest way?
That’s when this Curious Consultant started wondering what’s the fastest way to insert new records where one doesn’t already exist during a single process?
The Different Techniques
Other than the LEFT JOIN, what are other ways to accomplish this? Here are a few common ways that will be benchmarked:
Setting Things Up for the Speed Test
For testing purposes, the SQL code was executed on a Windows 2012 Server with 128GB memory, 16 core CPU, using Microsoft SQL Server 2014.
To ensure SQL Server didn’t keep any queries (or anything for that matter) cached, the following code was run before each test:
1 2 3 4 5 6 7 8 9 10 |
checkpoint go DBCC DROPCLEANBUFFERS go DBCC FREESESSIONCACHE go DBCC FREEPROCCACHE go DBCC FREESYSTEMCACHE('ALL') go |
Two tables are created, and populated with up to 50 million rows. The primary keys are the same:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
--create main table CREATE TABLE [#table1] ( [Id] int NOT NULL ,[guidd] uniqueidentifier not null ,[TimeAdded] Datetime null --just to have extra data ,[ExtraData] uniqueidentifier null --just to have extra data CONSTRAINT [pk_table1] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE= OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 93) ON [PRIMARY] ) ON [PRIMARY] --create table we'll be adding data from CREATE TABLE [#table2]( [Id] int NOT NULL ,[guidd] uniqueidentifier NOT NULL ,[TimeAdded] datetime null --just to have extra data ,[ExtraData] uniqueidentifier null --just to have extra data CONSTRAINT [pk_table2] PRIMARY KEY CLUSTERED ( [Id] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE= OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 93) ON [PRIMARY] ) ON [PRIMARY] |
The source code is at the end of this blog, but here are the full code snippets for each technique:
# |
Code | ||
1) Insert Where Not Exists |
|
||
2) Merge |
|
||
3) Insert Except |
|
||
4) Left Join |
|
The Results Are In!
Here are the results, in milli-seconds, as to how long each technique took to execute.
The winner is highlighted in green.
Technique |
Time to execute, in milliseconds, over 3 runs: |
||||
# Records: |
50,000 |
500,000 |
5,000,000 |
50,000,000 |
500,000,000 |
1: Insert Where Not Exists |
166, 166, 163 |
1006, 990, 1066 |
16183, 16386, 16280 |
168896, 169770, 171383 |
355310, 348843, 348780 |
2: Merge |
243, 233, 233 |
2426, 2460, 2473 |
24813, 24610, 25000 |
255043, 255043, 255213 |
463486, 513423, 504380 |
3: Insert Except |
173, 170, 173 |
850, 826, 870 |
16220, 16140, 16160 |
104223, 106333, 110643 |
221620, 221946, 222553 |
4: Left Join |
140, 136, 140 |
716, 726, 760 |
15436, 15406, 15436 |
167053, 160610, 169553 |
344873, 315123, 342326 |
The Way To Go
Who else thought the LEFT JOIN would be the dominant player up through 500,000,000 records and beyond?
I never thought the LEFT JOIN would slow down as much as it did when hitting 50,000,000 records and more.
While it seems to dominate for less than 50,000,000, once that threshold is hit the INSERT EXCEPT option proved to be king.
Note that results may vary by server because the optimization engine on every SQL Server Instance could do it differently. The SQL code is below, so feel free to use it as a basis for conducting your own performance benchmarks.
Leave a comment and share the knowledge if you have any suggestions or other ways of doing this!
The SQL Source Code Used
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
checkpoint go DBCC DROPCLEANBUFFERS go DBCC FREESESSIONCACHE go DBCC FREEPROCCACHE go DBCC FREESYSTEMCACHE ('ALL') go print '#### Started at: ' +Cast(GETDATE() as varchar) DECLARE @counter int = 0 DECLARE @max int = 50000000 DECLARE @start datetime DECLARE @end datetime DECLARE @unique uniqueidentifier DECLARE @RandomDate datetime DECLARE @RandomGuid uniqueidentifier DECLARE @Results TABLE ( [Technique] varchar(50) ,[TotalRecords] int ,[TimeTaken] varchar(50) ) --create main table CREATE TABLE [#table1]( [Id] int NOT NULL ,[guidd] uniqueidentifier not null ,[TimeAdded] Datetime null --just to have extra data ,[ExtraData] uniqueidentifier null --just to have extra data CONSTRAINT [pk_table1] PRIMARY KEY CLUSTERED ( [Id] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 93) ON [PRIMARY] ) ON [PRIMARY] --create table we'll be adding data from CREATE TABLE [#table2]( [Id] int NOT NULL ,[guidd] uniqueidentifier not null ,[TimeAdded] datetime null --just to have extra data ,[ExtraData] uniqueidentifier null --just to have extra data CONSTRAINT [pk_table2] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE= OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 93) ON [PRIMARY] ) ON [PRIMARY] SET NOCOUNT ON --populate the tables print 'Populating the temp tables: ' + CAST(GETDATE() as varchar) WHILE (@counter <= @max) BEGIN --get the unique SET @unique = (SELECT NEWID()) SET @RandomDate = (SELECT GETDATE()) SET @RandomGuid =(SELECT NEWID()) --this table gets every record so we know we'll insert some INSERT INTO #table2 (Id, guidd, TimeAdded, ExtraData) VALUES (@counter, @unique, @RandomDate, @RandomGuid) --this table gets every other record so we know there are some to be inserted IF (@counter % 2 = 0) BEGIN INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData) VALUES (@counter, @unique, @RandomDate, @RandomGuid) END SET @counter = @counter + 1 END print 'Finished populating the temp tables: ' + CAST(GETDATE() as varchar) SET NOCOUNT OFF --do the inserts to see what's fastest --insert where not exists SET @start = (select Getdate()) INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData) SELECT Id, guidd, TimeAdded, ExtraData FROM #table2 WHERE NOT EXISTS (Select Id, guidd from #table1 WHERE #table1.id = #table2.id) SET NOCOUNT ON SET @end = (select Getdate()) INSERT INTO @Results VALUES ('Insert Where Not Exists', @max, CAST(DATEDIFF(ms, @start, @end) as varchar)) print CAST(DATEDIFF(ms, @start, @end) as varchar) + ' milliseconds for insert where not exists' SET NOCOUNT OFF DELETE FROM #table1 WHERE Id % 2 = 1 --merge SET @start = (select Getdate()) MERGE #table1 as [Target] USING (select Id, guidd, TimeAdded, ExtraData from #table2) as [Source] (id, guidd, TimeAdded, ExtraData) on [Target].id = [Source].id WHEN NOT MATCHED THEN INSERT (id, guidd, TimeAdded, ExtraData) VALUES ([Source].id, [Source].guidd, [Source].TimeAdded, [Source].ExtraData); SET NOCOUNT ON SET @end = (select Getdate()) INSERT INTO @Results VALUES ('Merge', @max, CAST(DATEDIFF(ms, @start, @end) as varchar)) print CAST(DATEDIFF(ms, @start, @end) as varchar) + ' milliseconds for merge' SET NOCOUNT OFF DELETE FROM #table1 WHERE Id % 2 = 1 --insert except SET @start = (select Getdate()) INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData) SELECT id, guidd, TimeAdded, ExtraData from #table2 EXCEPT SELECT id, guidd, TimeAdded, ExtraData from #table1 SET NOCOUNT ON SET @end = (select Getdate()) INSERT INTO @Results VALUES ('Insert Except', @max, CAST(DATEDIFF(ms, @start, @end) as varchar)) print CAST(DATEDIFF(ms, @start, @end) as varchar) + ' milliseconds for insert except' SET NOCOUNT OFF DELETE FROM #table1 WHERE Id % 2 = 1 --left join SET @start = (select Getdate()) INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData) SELECT #table2.id, #table2.guidd, #table2.TimeAdded, #table2.ExtraData FROM #table2 LEFT JOIN #table1 on #table1.id = #table2.id WHERE #table1.id is null SET NOCOUNT ON SET @end = (select Getdate()) INSERT INTO @Results VALUES ('Left Join', @max, CAST(DATEDIFF(ms, @start, @end) as varchar)) print CAST(DATEDIFF(ms, @start, @end) as varchar) + ' milliseconds for left join' SET NOCOUNT OFF drop table #table1 drop table #table2 print '#### Finished at: ' + Cast(GETDATE() as varchar) Select * From @Results Order By Technique |