Andrew Calvett's SQL Server Blog

SSAS Deployment essentials

March 30, 2011 Leave a comment

Over the recent years i have enjoyed the privilege of working on a number of different SSAS deployments. Some are huge, some are complex and some of them are huge and complex and most interestingly they all behave differently.

What i want to share today is what i consider to be essential for an SSAS installation. This covers what i expect to see installed to compliment SSAS, configuration settings to be changed and the literature that should be at your finger tips.

Read more…

Categories: Analysis Services, Configuration, Tips, Utilities

SSAS 2008 R2– Little Gems

December 3, 2010 Leave a comment

I have spent the last few days working with SSAS 2008 R2 and noticed a few small enhancements which many people probably won’t notice but i will list them here and why they are important to me.

New profiler events

Commit: This is a new sub class event for “progress report end”. This represents the elapsed time taken for the server to commit your data. It is important because for the duration of this event a server level lock will be in place blocking all incoming connections and causing time out. You can read about the server level lock here. Prior to 2008 R2 the only way to calculate the duration of the commit is to subtract the end time of the batch completion event from the event immediately preceding it. You want to be looking out for this event if people are complaining about connectivity……

File Load and Save: This is a whole new category with its own events but you must remember to tick the boxes to show all the categories to see it though otherwise its hidden. Anyway as soon as i saw it i thought awesome! As the name describes it exposes the file activity and gives a whole new level of insight into the black box known as SSAS Open-mouthed smile . You may be wondering how it is useful to know what files are being accessed but when your trying to understand what’s going on under the hood and where a bottleneck might be its invaluable and i have been using it these past 2 days whilst working on a problem which i will discuss in a future post.

Vertipaq SE Query: Not much to say here other than i would expect them to be fast…..

Other enhancements

In august 2009 i posted about The cost of SSAS Metadata management and discussed a hidden performance penalty. Well i am pleased to say that Microsoft have fixed this particular problem and when you process a partition it will not check for dependencies across every partition in the database any more…… Now before you get excited and decide to create thousands of partitions and throw away your partition merging scripts you should wait for the post i allude to earlier in this post as there are other significant costs & penalties for having to much metadata………

Last but not least a big thank you to MS for improving the time it takes to expand an object in SSMS! With SSAS 2005 & 2008 i could be waiting 30 seconds to a couple of minutes to expand a node in SSMS which is very frustrating when your in a rush but with SSAS 2008 R2 node expansion is now instantaneous! So thank you again, it may be a small fix but its a big time saver!

Categories: Analysis Services, Profiler, Tips

SSAS Native v .net Provider

November 30, 2010 Leave a comment

Recently I was investigating why a new server which is in its parallel running phase was taking significantly longer to process the daily data than the server its due to replace.

The server has SQL & SSAS installed so the problem was not likely to be in the network transfer as its using shared memory. As i dug around the SQL dmv’s i noticed in sys.dm_exec_connections that the SSAS connection had a packet size of 8000 bytes instead of the usual 4096 bytes and from there i found that the datasource had been configured with the wrong provider but what was really interesting and the point of the blog is the performance difference which i have shown below.

	Rows per second
.Net SqlClient Data Provide	30,000
SQL Server Native Client 10	240,000

That’s right! For a single partition, the native client was able to process 240,000 rows per second where as the .net client maxes out at 30,000. That means the SQL Native Client is 800% faster! I knew that the .net providers were slower but I had never gathered any metrics before. If your looking after a SSAS server I would definitely recommend taking a few minutes to check which provider is configured in the datasource.

Another point to consider is you may have a custom solution that is doing your ETL and utilising the .net providers. This would also be impacted by the .net provider throughput limits and a switch over to SSIS could dramatically improve your ETL.

Categories: Analysis Services, Performance, Tips

Server Side Aliases

November 16, 2010 Leave a comment

Over the years i have come across a few situations where server side connections to SQL server fail when you use a DNS alias that points back to the server your initiating the connection from but you can connect remotely.

Its an annoying problem which has a very unhelpful error message that changed in different versions of SQL. In SQL 2000 you are presented with

"Login failed for user ‘(null)’. Reason: Not associated with a trusted SQL Server connection." and in SQL 2005 + SQL 2008 its “Login failed. The login is from an untrusted domain and cannot be used with Windows authentication”.

You will also see eventid 537 in the security logs

One of the most common reasons a system is setup with an alias pointing back on itself is because a consolidation has taken place and you don’t want to change the connection strings. However some people simply got burnt when Microsoft first released the security patch which introduced this change and i still find people being burnt today.

Cause

NTLM reflection protection was introduced as part of security fix MS08-068. This causes a local authentication failure when using a dns alias which bubbles up and becomes the error described above.

Relevant MS Articles are MS08-068 & http://support.microsoft.com/kb/926642 and cause extract is:

This problem occurs because of the way that NT LAN Manager (NTLM) treats different naming conventions as remote entities instead of as local entities. A local authentication failure might occur when the client calculates and caches the correct response to the NTLM challenge that is sent by the server in local "lsass" memory before the response is sent back to the server. When the server code for NTLM finds the received response in the local "lsass" cache, the code does not honour the authentication request and treats it as a replay attack. This behaviour leads to a local authentication failure.

Solution

You either need to use the local name rather than DNS alias or there are steps described in the resolutions section of the articles to disable the protection totally or for a specific alias.

Categories: Configuration, Configuration, Errors, SQL Server, Windows Server

SSAS <PreAllocate>: What you should know

July 18, 2010 2 comments

Preallocating memory for SSAS running on Windows 2003 is a good thing but as with all good things it is essential to know the behavioural changes you will experience, some of which may not be so obvious.

My observations relate to SSAS 2008 SP1 CU8 running on Windows 2003 SP2.

Why PreAllocate?

In my opinion there are 2 reasons which i detail below.

The first is the widely stated performance reason surrounding Windows 2003 memory performance. In a nut shell, Windows 2003 did not scale well with many small memory allocations due to fragmentation etc so allocate it up front. Life gets better in Windows 2008 as detailed by the SQLCAT team.
The second reason is to ensure SSAS is going to get a slice of the memory and this is very important if your not running on a dedicated SSAS box.

So, what i should i know?

When the service starts (don’t forget server startup), if you have assigned "Lock Pages in Memory" to your service account, expect your server to be totally unresponsive for a period of time. Do not panic, the duration of the freeze depends on the amount of memory preallocated but once its done the server becomes responsive again. Make sure the people working with the server know this……
Never ever set PreAllocate equal to or greater than <LowMemoryLimit> because if you do the memory cleaner thread is going to spin up and remove data pretty much as soon as it gets into memory. This will seriously hurt performance as your effectively disabling any caching.
The shrinkable and nonshrinkable perfmon memory counters are no longer accurate. The counters have “value” when troubleshooting but you must factor in the fact that at least their starting points are wrong.
When a full memory dump occurs that dump will be at least the size of the preallocated memory. So, if you preallocate 40gb but SSAS has only written to 2GB of memory its still going to be a 40GB dump so make sure you have the disk space! Hopefully though this is not a situation you should find yourself in very often.

I hope you find this information useful!

Categories: Analysis Services, Configuration, Memory, Windows Server

SSAS – Synchronisation performance

April 8, 2010 Leave a comment

I’ve always thought of SSAS synchronisation as a clever file mirroring utility built into SSAS and i have never considered the technology as bringing any performance gains to the table. So, its a good job I like to revisit areas…. 🙂

I decided to compare the performance of robocopy and SSAS Synchronisation between 2 Windows 2003 servers running SSAS 2008 SP1 CU7 with 1gb network links. For the robocopy of the data directory i used the SQLCat Robocopy Script. The results are shown below.

	SSAS Sync	Robocopy
Full DB (138gb)	58 min	96 min
Single Partition (10gb)	3 min 37s	7 min 46s

The Full DB copy is as it says, no data is present so synchronise the lot.

Single Partition is where after the full db is processed i then process a single partition (and yes the partition is around 10gb) and then run the synchronisation.

The results left me gobsmacked! At first i simply could not believe that the SSAS sync could copy the data that much faster than robocopy but i repeated my tests a number of times and it does!

Yes, SSAS Synchronisation is between 100% and 50% faster than Robocopy… I’ve got to say that when Microsoft did the rewrite for 2008 they certainly did a fine job!

Before I finish up with this blog entry I must say there are always considerations as to whether synchronisation is appropriate for you and i will be posting soon about all the different options and the considerations.

Categories: Analysis Services, Synchronisation

Exploring backup read io performance

January 24, 2010 Leave a comment

I was recently exploring how to increase the backup read throughput on one of our SQL servers. Below are some interesting facts i found.

I would say that one of the most important reminders that came from the exercise is, do not assume that 2 databases being backup up on the same server using an identical backup command means that the processes are identical under the hood.

Backup read threads are spawned 1 per physical device used by the database. (This is documented in Optimising Backup & Restore Performance in SQL Server)
Multiple database files on 1 disk will not increase throughput because 1 disk = 1 thread and the thread works through the database files 1 at a time.
Backup read buffers are evenly distributed across the number of read threads.
Backups are pure IO operations, they do not read pages from the buffer pool.
When passing in @MaxTransferSize it appears to be a suggestion rather than implicit and SQLServer will assign the requested value if it can otherwise it can pick another lower value.

best find had to be trace flag 3213

2010-01-22 12:00:02.45 spid78 Buffer count: 40

2010-01-22 12:00:02.45 spid78 Max transfer size: 448 KB

2010-01-22 12:00:02.45 spid78 Buffers per read stream: 10

2010-01-22 12:00:02.45 spid78 Tabular data device count: 4

Red covers the max transfer size. I actually asked for 1mb but only got 448 KB. Additionally i also noticed that where i kick of multiple backups (but all requesting 1mb) that the transfer size tends to decrease the more backups you have. So, no one backup is necessarily the same.
Green covers buffer distribution. So, i asked for 40 buffers. The database being backed up has data devices on 4 physical disks so gets 4 read threads. Buffers per read stream is 10 which is (40 buffers / 4 threads).
Blue covers read threads. The database backed up had data files on 4 physical disks. This is exposed as the Tabular data device count and confirms the statement in point 1 that you get 1 read thread per physical device as documented by MS.

So, what about statements 2 & 4? Well, i monitored the reads to the individual files using sys.dm_virtual_io_file_stats and took a number of snapshots whilst performing a backup. There are plenty of scripts you can download to take the snapshots yourself such as this one. Once the backup completed i looked at the time slices and you can see the following.

Total mb read during backup = total data held in the file. From this i drew the conclusion its not reading any of the data held in the buffer pool which makes a lot of sense as the backup includes the transaction log.
Querying the statistics at different time intervals you see the first datafile MB’s growing and then the second data file mb’s don’t start growing until first is complete hence its going 1 file at a time. However, if you have multiple files on multiple disks you do see 1 file on each disk being read from. I’ve not mentioned increasing the number of backup devices and changing block sizes as my case specifically did not call for it but you can read about that here.

The last thing i want to say since i have touched on single threaded backup reads is that i`m keen not to spawn any new urban legends. Whilst this is true for the backups in the context of per physical disk device, that’s it! Its worth reading this article about urban legends around SQL threads.

Categories: Backup, Performance, SQL Server

SSAS 2008 – INI Files and in place upgrades

September 24, 2009 Leave a comment

Being the suspicious person i am i wondered if there would be any differences in the MSMDSRV.ini of an instance upgraded from 2005 as opposed to a clean install.

Now obviously i expect an in place upgrade to preserve my settings and add any new ones because it should not overwrite anything since i might have change from defaults for a good reason…….

Below is what i found followed by my thoughts….

IN Place Upgrade Value (Effectively 2005)

<ServerSendTimeout>-1</ServerSendTimeout>

<ServerReceiveTimeout>-1</ServerReceiveTimeout>

<AggregationPerfLog>1</AggregationPerfLog>

<DataStoreStringPageSize>8192</DataStoreStringPageSize>

<MinCalculationLRUSize>20</MinCalculationLRUSize>

<MdxSubqueries>3</MdxSubqueries>

2008 Clean Install Value

<ServerSendTimeout>60000</ServerSendTimeout>

<ServerReceiveTimeout>60000</ServerReceiveTimeout>

DELETED

<DataStoreStringPageSize>65536</DataStoreStringPageSize>

DELETED

<MdxSubqueries>15</MdxSubqueries>

Looking at what has changed they appear to be settings which may well have been tuned as a result of lessons learnt at Microsoft. The removal of AggregationPerfLog i suspect is cosmetic and the setting probably does nothing since there is another one called AggregationPerfLog2 which i suspect replaces it. Its also quite likely the same is the case with MinCalculationLRUSize.

An important thing to take away here is that an in place upgrade may not perform/behave the same way as a clean install because by default its ini file is going to be different. In my case i`m checking out the impact of the settings with a view to adding a step to our upgrade path to change them to the clean install values.

One more thing to get you thinking. If the settings did change based on lessons learnt, maybe its worth porting these back to 2005 and taking them for a spin…………. Test test and test some more!

Categories: Analysis Services, Configuration, Tips

Enter the SSAS server level lock……

September 23, 2009 7 comments

Ok, so your reaction to the title is probably the same as mine when i found out about SSAS server level locks! So, i will give you the scripts to reproduce the server level lock but first lets get down to business…. 🙂

Server locks were introduced in one of the 2005 SP2 cumulative updates. At the moment i can say it was pre CU12. I`m not sure why it was introduced but it likely to be in response to a “feature” 🙂

Fortunately the lock only appears at the end of processing when SSAS commits its data and commits are usually quick so depending when you do your processing you might never see it. So why am i so horrified by the existence of this lock other than its simply wrong to prevent connections to the server? Below are my concerns….

If a query is running when processing comes to commit it must queue behind the query for a default of 30 seconds but processing still gets the server level lock granted meaning no one gets to connect for up to 30 seconds + commit time and users get connection errors!
ForceCommitTimeout is the setting that controls the duration a commit job waits to kill the queries ahead of it. People should now think of this setting not only as the time your allowing queries to complete before being killed but also the additional duration of time your prepared to deny users access to the server.
The real kick in the pants comes along when you find out that there are scenarios where a query will not respond to the query cancel invoked by ForceCommitTimeout. The obvious one is when there is a bug but there are others. This means that the commit can’t kill the query and your server is effectively hung and the users are screaming. What’s worse is the SYSADMIN can’t connect to the server to diagnose the problem because the server lock blocks them!
I have seen connections error when connecting to the server due to the server level lock which is even worse. Unfortunately i have not managed to identify the repo (yet).

Categories: Analysis Services, Configuration, Locking, Trouble Shooting

The cost of SSAS metadata management

August 21, 2009 1 comment

This post will explain how we found ourselves in a situation where when processing a partition the metadata checks took 50 times longer than the actual partition and how to identify what time penalty you are incurring for the metadata checks.

The system i was investigating had around 80 cubes with around 100 partitions per cube all hosted in one database and we would add 1 partition per cube per day. The partitions in each cube were small but were taking far to long to process and as each day passed they would get a bit slower and inch their way that bit closer to breaking our SLA’s.

We know that too many partitions can hurt performance and this is documented in a SQLCat blog where it states “Partitions have some metadata and when the number exceeds several thousand, the cost of managing this metadata becomes apparent”.

To our detriment we had always believed that this would apply at the cube level since a partition can not be related to a partition in another cube. Wrong! When you process a partition SSAS will check for dependencies across every partition in the database. In our case this translated to checking for dependencies across > 8000 partitions and a processing delay of > 4 minutes!!! The actual processing itself only took a few seconds.

So, how do you work out the cost of metadata checking?

Start a profiler session and include command begin and progress reports. The first event after command begin will be Build Processing Schedule. The time difference between these two events is the “Cost of SSAS Metadata management”. In the screen shot below you can see the cost is 3 minutes 9 seconds.

So, now you have identified this overhead how do you address it? Unfortunately there is no magic bullet so the workarounds are below.

Reduce the number of partitions in the database through merging partitions or splitting into multiple databases.
You could move to SQL 2008 which does perform slightly better and reduces check time by about 25%. Still not good enough really……. (and yes that means you have a problem in 2005 & 2008)
Wait until Microsoft address the issue. Well its not currently going to be addressed for 2005 or 2008 but we might see it in 2008 R2.

I hope you find this helpful and if you are being hurt by this issue then let Microsoft know because then we might see it addressed sooner. 🙂

Update

Since this problem is by design i have opened an improvement suggestion on Connect which you can find here. If you agree that Microsoft should implement a design change to improve performance please vote for for the connect item.

Categories: Analysis Services, Performance, Trouble Shooting

Analysis Server appears to hang…..

April 5, 2009 4 comments

We had an on going problem where by users would suddenly start complaining that the Analysis Server was hanging. When investigating the box we could see that there appeared to be no physical constraints. Disks were a bit above average, CPU was very low and there was plenty of memory.

When looking at the server through the SSAS activity viewer (part of the community samples) we could see that users were definitely queuing up and many of the queries should have returned in less than a second but were hanging around for ever (30 minutes or more). It was as if we were experiencing some form of blocking…….

To compliment our monitoring we use ASTrace (also part of the community sample) to do real time logging of what’s happening on the SSAS server to a SQL Server and amongst the capabilities it gives us is the ability to run a little procedure to show us what mdx/xmla is running with duration etc (its much more friendly than the activity viewer). So, when we were experiencing the trouble our script showed that an MDX query that touched multiple partitions totalling 100’s of gb’s appeared to be at the head of the chain every time.

Categories: Analysis Services, Locking, Performance, Tips, Trouble Shooting

The evils of implicit conversions

January 18, 2009 Leave a comment

I wanted to put up a brief post showing the impact of an implicit conversion on the performance of a query (or not……). In the example i will show an implicit conversion negatively impacting query performance and an implicit conversion that does not impact performance…..

So, we need to setup the test environment using the code below.

SET ANSI_WARNINGS OFF --******** Create Test Data ****************** CREATE TABLE #data_test(ukey INT IDENTITY(1,1) PRIMARY KEY , first VARCHAR(200),second VARCHAR(200)) DECLARE @first INT, @second INT SELECT @first = 1 WHILE @first < 250000 BEGIN INSERT #data_test SELECT REPLICATE(@first,@first), REPLICATE(@first,@first) SELECT @first = @first +1 END CREATE NONCLUSTERED INDEX stuf_1 ON #data_test (first) --******** End of Test Data ******************

Now, with the test data in place we can run the following 2 queries and observe the differences.

/* This uses a variable declared as an NVARCHAR */ EXEC sp_executesql N'SELECT * FROM #data_test WHERE first = (@p0) ', N'@p0 nvarchar(200)',@p0 = N'1' --Scan count 1, logical reads 3093, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

/* This uses a variable declared as an VARCHAR */
EXEC sp_executesql N’SELECT * FROM #data_test WHERE first = (@p0)’, N’@p0 varchar(200)’,@p0 = ‘1’
–Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Below each statement is the io incurred and the difference on this tiny little table is > 3000 IO’s just because we used unicode (nvarchar) instead of non-unicode (varchar) and i`m sure you can imagine that on a larger table this becomes a significant overhead. So, why has this happened? Lets take a look at the plans.

Categories: Performance, SQL Server

The overhead of a non-unique clustered index

October 20, 2008 Leave a comment

So, we all know that if we create a clustered index that is not unique that we will incur a 4 byte overhead right? Well not always because as usual, it depends….. Geeked

When you create a non-unique clustered index SQL server must maintain uniqueness so it adds a hidden 4 byte column which is populated for each non-unique row (not every row) but what many people may not realise is that this is actually a variable length column so if your table has no variable length columns you have to incur another 4 bytes to maintain the variable offset data giving you a total of 8 bytes per row instead of 4 bytes.

A few bytes may not sound much but when dealing with multi billion row tables it soon adds up so its important to know how the space consumption breaks down.

Below is an extract from Books Online 2005 "Estimating the size of a clustered index"

"The uniqueifier is a nullable, variable-length column. It will be nonnull and 4 bytes in size in rows that have nonunique key values. This value is part of the index key and is required to make sure that every row has a unique key value."

Its great to see that this hidden column is now documented but a bit of additional clarity around its potential variable length property storage overhead would nice.

Finally I thought I would visual this hidden data overhead for you with a screen shot from Danny’s awesome Internals Viewer

Update: Thanks to Christian Bolton for clarification that the overhead is for each non-unique row which i have now reflected in the post.

Categories: Internals, Performance, SQL Server

Changing the Data Files Location after Installation

August 10, 2008 Leave a comment

The other day i wanted to change the “Data Files” location for a 2005 database engine installation and a 2005 Analysis Services installation which you can specify under the advanced options during installation. I quickly found out that there appears to be no documented ways to do this other than uninstall SQL Server and install again specifying a new location for data files. It’s also not as simple as moving your system databases as “Data files” covers things like server errors logs, sql agent logs, replication default directory etc. So, as the uninstall route was not one i was prepared to go down i sat down and worked out how to do it and below are the results.

Categories: Analysis Services, Configuration, Configuration, SQL Server, Tips

MS SQL Server Book of Wisdom

May 21, 2008 Leave a comment

I was chatting with a friend today and he asked “Have you ever seen those little books of wisdom?”. We quickly decided that we could write a MS SQL Book of Wisdom, Below is a summary of what ensued for your amusement. Now, some of the statements are actually based on bad real life advice and many we just made up. Can you tell which is which? Also please comment if you have got any good entries for the Book……

Read more…

Categories: SQL Server, Uncategorized

SSAS 2005 – Server side tracing starter kit

April 7, 2008 4 comments

Analysis services 2005 (SSAS) added the ability to trace server side events and i have used this feature a number of times. To date i had always used the profiler gui to do the SSAS tracing but today i found myself needing to initiate and manage a trace with scripts.

The good news is that it can be done! It did take a while to piece together how to do it though and i found some of the information quite a challenge to find so i am sharing with you the results and have attached a zip file with the necessary scripts.

So, what did i want to achieve?

A script that would create a trace on the server and log to a specified directory similar to the way you can with a SQL Server trace.
A script that would list all running traces on an analysis server.
A script that would destroy a named running trace, in my case the one i created.

Now, the script that creates the trace is likely to require editing each time to add new events as the script i am attaching only captures command events. The easiest way to define your events is detailed below.

Open SQL Server profiler and define the SSAS trace you require.
Next script the trace by going to “File – Export – Script Trace Definition – For Analysis Services 2005”.
Open the script file and cut & copy the Events & Filter elements into my attached script ensuring you replace the existing Events & Filter elements.

Some people might be wondering why i needed to create the script file if i can script it from profiler? Well, profiler only scripts the events and filters and excludes options such as LogFileName, AutoRestart etc.

So, with the events in place you should now update the LogFileName element with your filename & path and check the LogFileSize element is appropriate. Finally, there is a StopTime element that you can uncomment and set which sets a time for the trace to automatically close but do not forget its the time at the server you are setting not the time where you are.

With all the updating done just run the script to create your own server side SSAS trace. It does not end here though because you will need to stop the trace manually if you have not enabled a StopTime. This is where “Delete Named Trace.xmla” comes in. Simply update the name element and run the script to delete the trace. Unlike SQL Server you do not need to stop and then close the trace. If you are not sure of the name of the trace you can run the script “List all server side traces.xmla” which is also useful for validating that you have removed the trace or that it auto closed. The list traces also gives useful information such as where the traces are outputing their results.

The trace script was amended from an example in Analysis Services Processing Best Practices and i would definitely recommend reading the article. The other scripts i hacked together and are very simple as i am a xmla novice.

I hope you find this information useful, you can locate the scripts here.

Categories: Analysis Services, Profiler

Transaction log backup deadlock

February 23, 2008 1 comment

Recently we started to see deadlock errors when backing up our transaction logs. The "important" part of the error is shown below.

Could not insert a backup or restore history/detail record in the msdb database. This may indicate a problem with the msdb database. The backup/restore operation was still successful.

What this meant was that the transaction log backup was occurring but the entry in the msdb was not being made as it was being chosen as a deadlock victim so we investigated the cause of the problem as we had some processes that used this information to copy transaction logs to other servers and we needed it to be complete.

We used trace flag 1222 to output the deadlock information to the error log and found the culprit to be a Microsoft stored procedure called "sp_delete_backuphistory" that is called by SQL 2005 maintenance plans when you use the "History cleanup task" and tick the "Backup and Restore History". Having a look at the stored procedure it was obvious why it was deadlocking so we decided to log our findings with Microsoft. Microsoft have confirmed the bug and have stated it will be fixed in SQL 2008 but have stated they will not be issuing a KB in the immediate future which is one of the reasons i decided to blog about it.

I have had a look at the latest 2008 CTP and can confirm that Microsoft has updated the stored procedure to avoid the deadlocking and i noticed they also added a non-clustered index on the backup_finish_date in the backupset table (finally). I would also like to point out that the changes made to the stored procedure could easily be ported back to SQL 2005 so i`m a little surprised they have not been.

Work Around

To avoid this specific issue, we took the approach of identifying a generic window when transaction log backups would not be running on 95% of our server estate and changed the "History cleanup task" to run at this time. For the remaining 5% we worked out per server windows and now we do not see the issue on any of our servers.

A brief history of msdb backup history tables

Now, for those of you who want to know more about this problem and are wondering why this problem seems to only just be appearing in SQL 2005 here is a brief history lesson.

Prior to SQL 2005, maintenance plans never gave the ability to delete old backup history but the procedure "sp_delete_backuphistory" did exist. So, many DBA’s would find that their MSDB’s were growing rather large and if they used Enterprise Manager to do a restore it would hang for ages as it read the large backup tables. So, people would then find out about "sp_delete_backuphistory" and schedule it as a job but quite often the first time it was run it would take ages (some time days) to run due to poor coding and volume of data so people then implemented their own more efficient code (Google sp_delete_backuphistory and you will see what i am talking about, for example, see here).

So, what does this lead to? Well, not so many people using the MS stored procedure prior to 2005! But then SQL 2005 rocks up and we have the ability to call the procedure via the gui! Well, lets tick that puppy!!! 😀 We do need to keep that msdb trim after all and that is how we get to where we are now!

Categories: Backup, Deadlock, Errors, SQL Server

SSMS Log file viewer and Deadlock Graphs

January 23, 2008 1 comment

Firstly I must say a big thank you to Microsoft for the new deadlock trace flag 1222. Compared to the trace flag output for 1204 & 1205 that you had to use in SQL 2000 it’s a walk in the park to interpret.

Anyway, back to the post at hand! This is a quick FYI as i`m not going to go through how to interpret a deadlock graph because Bart Duncan does a fantastic job of it here.

When you enable 1222, it will output the deadlock information to the error log. If your using the log file viewer and steam on in and do your analysis you will probably find you get your deadlock victim the wrong way round like I did in the first cut of my analysis. Fortunately I did realise my mistake which made me look at the output again and I realised that the output in the log is upside down! This is because the log file viewer sorts the log so that the most recent entry is first and as such reverses the deadlock output. I`m not aware of any way to configure the sort order of the log file viewer and exporting the log exports it in the same order its displayed….

So, when looking at your deadlock information you have 2 options.

Find the occurrence of the words “deadlock – list” and read upwards.
Grab a copy of the error log from the servers log directory and open it in notepad.

Categories: Deadlock, Errors, SQL Server

The Job whose owner kept coming back……

December 30, 2007 Leave a comment

I thought i would share this little quirk about the SQL Agent jobs for maintenance plans.

One of our members of staff had left and we had the usual case of a few jobs failing with:

"Unable to determine if the owner (DOMAIN\xxx) of job <JOB_Name> has server access (reason: error code 0x534. [SQLSTATE 42000] (Error 15404))."

So, we went around and updated the job owners to one of our appropriate generic admin accounts. A few days later some of the jobs started to fail again with the same error, since we knew we had performed the update previously it was time to investigate how the job had been set back to the old user account.

It was quickly determined that the only jobs that had reverted back to the old owner were the jobs created by maintenance plans so we focused our attention here. It turns out that when you save a change to a maintenance plan the job owners are reset to the owner of the maintenance plan. The owner of the maintenance plan will be the account used to connect to the server in SSMS when creating the plan.

With this determined a slight variation of our fix was deployed. First we changed the job owners and next we updated the owner of the maintenance plan using the script at the end of the post. The script is in two parts, the first part shows you who owns what and the second updates the owner to an account you specify.

Agent jobs being created with a user account have always been a procedural problem. This is simply another variation on the problem that we need to take into consideration and put a process in place to deal with. The most likely processes are either to only create a maintenance plan when logged on with a generic account or run the script after creating the maintenance plan.

I am however curious why Microsoft have implemented updating the jobs in this manner and see it as having the potential to cause significant problems in environments that may not be monitoring their jobs as closely as is required and end up with maintenance tasks not running for some time. How to get around this? Well, given the nature of maintenance plans and the fact you must be a sysadmin to see or create them, surely it makes sense to have the owner as the SQL Service account or a user created by SQL for maintenance plans? Currently someone has posted this feature to connect here and i’ve added my two pennies worth so if you feel it should change then have you say too!

--See who owns which packages
SELECT name, description,SUSER_SNAME(ownersid) FROM msdb.dbo.sysdtspackages90 --Now we update the owner to an appropriate domain account. Either the service account or a generic admin account is good. UPDATE msdb.dbo.sysdtspackages90 SET OWNERSID = SUSER_SID('YOUR_DOMAIN\APPROPRIATE_ACCOUNT') WHERE
OWNERSID = SUSER_SID('YOUR_DOMAIN\OLD_ACCOUNT')

Categories: Backup, Configuration, SQL Agent, SQL Server

My old mate sp_recompile

October 12, 2007 Leave a comment

As soon as i saw the error messages in the logs i thought to myself "Oh my, that did not happen in testing" (ok, maybe it was more colourful than that).

We were creating a clustered index on a tiny little table and the index went through fine. However, the application started to generate the message "Could not complete cursor operation because the table schema changed after the cursor was declared". My gut reaction was to restart each application server in the cluster but having restarted the first one it made no difference. It suddenly clicked that SQL Server must be dishing out the cursor plan from cache.

Now, I did not want to restart the SQL servers because only a small part of the application was affected and I did not want a more significant outage. So, how do we get the plan out of cache? The table below details your options with the corresponding impact.

Action	Pros	Cons
EXEC sp_recompile ‘object’	Minimal impact. When passing a table name all procedures referencing it will be recompiled. Plans in cache not referencing the table will stay in cache.	You have to know the name of the object(s) needing to be recompiled.
DBCC FREEPROCCACHE	Quick and dirty.	The procedure cache for the server is cleared so the server will slow down whilst the cache populates again.
Restart SQL	I suppose you could say you are 100% sure you have nailed the sucker.	You have a system outage and you have to wait for your procedure and buffer cache to repopulate.

The lesson to take away here is to always use sp_recompile when making any kind of DDL changes, i also tend to use it on stored procs & views too. I normally always have it in my scripts so believe you me i gave myself a good talking to about forgetting to put it in this time Open-mouthed smile

And on a related note, have you come across sp_refreshview? No? Well, its worth knowing about.

Categories: Performance, SQL Server

Newer Entries Older Entries