Archive

Archive for the ‘Internals’ Category

When is SELECT INTO not minimally logged?

June 20, 2015 3 comments

If you had asked me that question a month ago i would have said “Never, it is always minimally logged and this behaviour is documented in BOL’. But now I know better…

Recently i have been working on an application migration from SQL 2008 R2 –> SQL 2014 and we found a huge drop in performance for some SELECT INTO statements. I traced the issue down to the fact the operation was being fully logged and then we engaged Microsoft to work out why. Read more…

Advertisements

The overhead of a non-unique clustered index

October 20, 2008 Leave a comment

So, we all know that if we create a clustered index that is not unique that we will incur a 4 byte overhead right? Well not always because as usual, it depends….. Geeked

When you create a non-unique clustered index SQL server must maintain uniqueness so it adds a hidden 4 byte column which is populated for each non-unique row (not every row) but what many people may not realise is that this is actually a variable length column so if your table has no variable length columns you have to incur another 4 bytes to maintain the variable offset data giving you a total of 8 bytes per row instead of 4 bytes.

A few bytes may not sound much but when dealing with multi billion row tables it soon adds up so its important to know how the space consumption breaks down.

Below is an extract from Books Online 2005 "Estimating the size of a clustered index"

"The uniqueifier is a nullable, variable-length column. It will be nonnull and 4 bytes in size in rows that have nonunique key values. This value is part of the index key and is required to make sure that every row has a unique key value."

Its great to see that this hidden column is now documented but a bit of additional clarity around its potential variable length property storage overhead would nice.

Finally I thought I would visual this hidden data overhead for you with a screen shot from Danny’s awesome Internals Viewer

ole0

Update: Thanks to Christian Bolton for clarification that the overhead is for each non-unique row which i have now reflected in the post.

Server level VLF report

July 25, 2007 Leave a comment

I read Tony Rogerson’s blog on Virtual Log Files today and it reminded me that i really should knock up a little report to list all databases on a server and the number of VLF’s per database. Since I had been busy writing some other Operational reports I was in the right frame of mind so knocked up what you see below.

The Report code

CREATE TABLE #VLFS (fileid INT,filesize bigint,startoffset bigint,fseqno bigint,status INT,parity INT,createlsn VARCHAR(1000))

CREATE TABLE #Results (SRV_Name NVARCHAR(500),Database_Name NVARCHAR(500),VLFS INT)

EXEC MASTER.dbo.sp_msforeachdb

    @command1 = 'USE ? INSERT INTO #VLFS EXEC(''DBCC LOGINFO WITH TABLERESULTS'')',

    @command2 = 'insert into #Results SELECT @@SERVERNAME,''?'',count(*) from #vlfs',

    @command3 = 'truncate table #vlfs'

--For this example i`ve just returned the results but you can

--just as easily write the results to a local or central server.

--I write mine to a central server for reporting on.

SELECT * FROM #Results

DROP TABLE #vlfs

DROP TABLE #Results

Example output

Results

For me the beauty of this little report is that i have set it up so that it runs on all 150 of my servers distributed around the world and logs to my central logging server. From here I get one report that tells me the databases and the servers they are hosted on that have excessive VLF’s. Having said that in a well managed environment i should never find any databases with lots of VLF’s……..

You may also be wondering "How many VLF’s are too many?". Tony did not cover this but Kimberly Tripp did in a post she put up a few years ago (See point 8) which was when I first learned about VLF’s. I don’t however think this is a hard and fast number and obviously the performance gains will be less if your not far off this number.

The last thing I am going to mention in this post is part of the code I used in the report. I used the undocumented procedure sp_msforeachdb. This is a great little procedure that will cycle through your databases executing up to 3 commands. To use it, where you would have put a database name you put a ‘?’ and its as simple as that! Incidentally there is also a sp_msforeachtable.

What is that SPID doing?

March 27, 2006 Leave a comment

This is certainly not a new topic but something that i felt was worth revisiting and reminding people about, especially because i find many people are not aware of the ability to extract the statement within a stored procedure being executed.

So, prior to SQL 2000 SP3, when we want to know what code a SPID is executing we have to use DBCC INPUTBUFFER (). The problem with this command is that its restricted to the first 256 characters and only returns the outermost statement executing. That means that if we have stored procedure 1 calling stored procedure 2 the input buffer will only tell us about stored procedure 1 and it may even get truncated!

I clearly remember reading the SP3 readme and thinking Christmas has come early as i found the entry about a new function being introduced called ::fn_get_sql. I’ve got to say right now this is the best present yet! It has made trouble shooting so much easier. Why? Well read on.

Running the code below will now show you the inner most statement executing and we now get up to 4000 characters. In the case of a stored procedure you see the whole procedure.

DECLARE @Handle BINARY(252)
SELECT @Handle = sql_handle FROM MASTER..sysprocesses WHERE spid = 109 --CHANGE SPID NUMBER
SELECT * FROM ::fn_get_sql(@Handle)

The part that really got me? When it returns the code of a stored procedure you could be left wondering what statement within the stored procedure is executing. Fear not! They have that covered to. All you have to do is run the modified code below and the exact statement causing you a problem is identified.

DECLARE @Handle BINARY(20), @start INT, @end INT
SELECT
@Handle = sql_handle, @start = stmt_start, @end = stmt_end
FROM MASTER..sysprocesses WHERE sql_handle <> 0x00 AND spid = 226  --CHANGE SPID NUMBER
IF NOT EXISTS (SELECT * FROM ::fn_get_sql(@Handle))  PRINT 'Handle not found in cache'
ELSE
SELECT 
'Current statement'=
SUBSTRING(TEXT, (@start + 2)/2,
                                
CASE @end
                                     
WHEN -1 THEN (DATALENGTH(TEXT))
ELSE (@end -@start + 2)/2
                                   
END)

FROM ::fn_get_sql(@Handle)

So, now i’ve reminded you of this really useful tool its time for a few "Battle Tips".

1) Make sure in query analyser your "maximum characters per column" is set to 4000. Otherwise you are only going to get 256….
2) If  SQL generates a "zero cost plan" it will not put it in cache because its not worth it. As its not in cache ::fn_get_sql won’t return any information.
3) This one comes with a health warning….. If for some reason you need SQL to cache the zero cost plans so that ::fn_get_sql will always work then you can enable trace flag 2861. I would recommend you do not use this in production as the overhead is substantial and profiler should be used instead but some people might find it beneficial on a dev box.
4) I’ve not adjusted the script above to deal with parallel spids where you get multiple handles for 1 spid. Just a note, i keep on forgetting.  😀

There are many uses for this function and i encourage you to explore them. SQL 2005 makes life even easier through DMV’s but that’s another blog……