When is SELECT INTO not minimally logged?
If you had asked me that question a month ago i would have said “Never, it is always minimally logged and this behaviour is documented in BOL’. But now I know better…
Recently i have been working on an application migration from SQL 2008 R2 –> SQL 2014 and we found a huge drop in performance for some SELECT INTO statements. I traced the issue down to the fact the operation was being fully logged and then we engaged Microsoft to work out why.
What we found is that there is a change in behaviour which was first introduced in SQL 2012 where by if your using a scalar function in conjunction with SELECT INTO there are scenarios where it will be fully logged. To demonstrate this i would like to walk you through a demo.
For the demo i have used AdventureWorksDW2008R2 which you can download here.
IF EXISTS (SELECT name from sys.sysobjects where name = 'FactResellerSales_Into' and type = 'U') DROP TABLE [dbo].[FactResellerSales_Into] --Make sure we have not got any old records for the table we are looking for. CHECKPOINT --Perform select into caling a scalar function SELECT dbo.udfTwoDigitZeroFill(SalesOrderLineNumber) as 'Function' ,* INTO [dbo].[FactResellerSales_Into] from [dbo].[FactResellerSales] --Pull out relevant log records from the transaction logs for the select into operation select Operation ,Context ,count(Operation) as 'Count ops' ,SUM([Log Record Length]) as 'Sum Record Length' from fn_dblog(null,null) Where AllocUnitName = 'dbo.FactResellerSales_Into' group by Operation,Context
Load the database to a SQL 2008 R2 server and run the code above, you will get results similar to below.
The results are the output of the log records for the insert operation and we can determine that the operation was minimally logged as we only see GAM, IAM & PFS operations.
Now load the database to SQL 2012 or SQL 2014 and repeat.
In this set of results we can see there are records relating to recording the actual data being inserts which means it is not minimally logged!!!
Fortunately you can get back to SQL 2008 R2 behaviour if you SCHEMA BIND your function so execute the following.
ALTER FUNCTION [dbo].[udfTwoDigitZeroFill] (@number int) RETURNS char(2) WITH SCHEMABINDING AS BEGIN DECLARE @result char(2); IF @number > 9 SET @result = convert(char(2), @number); ELSE SET @result = convert(char(2), '0' + convert(varchar, @number)); RETURN @result; END
Now execute the first query again and you will get result similar to the ones below.
As you can from the above we are minimally logged again.
It has been previously written that SCHEMA Binding a function improves performance and I have seen the benefits discussed in the post. Now you have another important reason to introduce schema binding to your functions.
Hi there,
Thanks for your article.
Why do you still have a LCX_HEAP after you activated the SCHEMA_BINDING ?
Thanks
Hi,
Because a single page still has to be allocated for the newly created table (heap) but SQL does not have to allocate pages for the data that is then inserted into it which is what the other allocations were in the fully logged instance.
Regards
Andrew
Thanks Andrew 🙂 Have a nice day and thanks again for the good post!