Archive

Archive for the ‘Trouble Shooting’ Category

Automating process monitor data collection

July 10, 2014 Leave a comment

Recently I needed to investigate why SSAS synchronisation commands were intermittently failing and the errors pointed to a file lock on a SSAS data file. To identify the file lock I needed to run process monitor for as long as it takes to capture the problem even if it takes weeks or months for it to occur. This presents the following problems

  • How to prevent the process monitor capture from consuming all available disk space.
  • How to ensure we do not over write the data we are looking for.
  • How to ensure the capture stops once we have the data we are looking for.

Read more…

Exposing the SSAS file system with the Metadata Sniffer

April 19, 2014 Leave a comment

Very recently I encountered a production issue and to assist me in the root cause analysis I needed information/clues that are not exposed by any SSAS interfaces so I created the metadata sniffer.

Read more…

SSAS – Out of memory exception during processing

March 3, 2014 Leave a comment

For a while now I been analysing a situation and i have got it down to a fine art now where by I do the following

  1. Create a memory a memory intensive situation where i am using 85% of available memory and SSAS is working hard to free memory.
  2. Process add a dimension and capture statistics.
  3. Review statistics, contemplate the results and then tweak something.
  4. Start at step 1 again.

The other day when i did my process add I was presented with the following error.

The following exception occurred while an operation was being performed on a data source view: The type initializer for ‘System.Xml.Schema.XmlSchemaComplexType’ threw an exception.;Exception of type ‘System.OutOfMemoryException’

Read more…

Extreme load and the “operation has been cancelled”

September 18, 2011 Leave a comment

There are many reasons why SSAS may generate an operation has been cancelled but i want to take you through a new, although rare, one that i recently discovered.

To set the scene I had been tuning the Process Index phase on one of our servers so we could fit a full reindex of a 1.8TB cube comfortably into a maintenance slot over the weekend. I will write another post about the actual performance and how we achieved it.

My tuning was going very well and we were pushing the box to its limits which is exactly what you want to do in a maintenance window Smile. I was running the index build server side and suddenly i received “The operation has been cancelled” message. I immediately went to look at my server trace and found that it had stopped recording at exactly the same time, how annoying.

With the current index build configuration every time i ran the build it failed with a cancelled error yet if i changed the configuration and eased of the throttle it would complete. This was unacceptable as i wanted every last drop out of the server. It was clear to me “something” was causing the cancel when i pushed the servers CPU’s to 100% so i dug out my friend process monitor and below is a screen shot of my “moment of discovery”.

image

Read more…

SSAS Exposes a W2K3 bug

April 6, 2011 Leave a comment

I recently came across a bug in Windows 2003 SP2 which causes a blue screen of death and SSAS was exposing the bug. In my scenario SSAS was the match, a file system cache function was the fuse and a BSOD the payload…..  If your an all Windows 2008 shop then there is no need to read on and i am envious Winking smile

One of the configurations i always set on a SSAS server is LimitFileSystemCache and one reason is that if you do not it can grow to the point where it has a negative impact on SSAS. When you set this configuration SSAS will use the API functions “GetSystemFileCacheSize” and “SetSystemFileCacheSize” and this is where the problem lies.

When SSAS starts for the first time it will set the file system cache and your server will be fine until you have to stop and then start the SSAS service without rebooting the server. The problem lies in the fact that when SSAS stops the operating system cleans up the memory in relation to the file system cache and a structure is not reset and as soon as its accessed it causes the blue screen. The article and hotfix for this problem is KB979128.

So, if you have PreAllocated your SSAS memory which is something i often do then your quite likely to get the blue screen when SSAS starts but you might not as its very hit and miss. I found if i did not get it at startup i would definitely get it when i pushed the server to its limits.

Now should you rush out and apply this patch? Well, its always good to ensure stability but i have been supporting SSAS servers for years and the first time i saw the blue screen was a few months ago in test so you can at least take your time to test the patch before rolling it out…… Smile

Its also worth noting that other users of tools that set the file system cache may well experience this problem and i mention this because its not uncommon to limit the file system cache for SQL Database deployments…..

Below is the OS bug check we got when we hit the problem.

Reason Code: 0x805000f
Bugcheck String: 0x0000000a (0x0000000000000028, 0x0000000000000002, 0x0000000000000000, 0xfffff800010e9a8b)

Enter the SSAS server level lock……

September 23, 2009 7 comments

Ok, so your reaction to the title is probably the same as mine when i found out about SSAS server level locks! So, i will give you the scripts to reproduce the server level lock but first lets get down to business….  🙂

Server locks were introduced in one of the 2005 SP2 cumulative updates. At the moment i can say it was pre CU12. I`m not sure why it was introduced but it likely to be in response to a “feature” 🙂

Fortunately the lock only appears at the end of processing when SSAS commits its data and commits are usually quick so depending when you do your processing you might never see it. So why am i so horrified by the existence of this lock other than its simply wrong to prevent connections to the server? Below are my concerns….

  • If a query is running when processing comes to commit it must queue behind the query for a default of 30 seconds but processing still gets the server level lock granted meaning no one gets to connect for up to 30 seconds + commit time and users get connection errors!
  • ForceCommitTimeout is the setting that controls the duration a commit job waits to kill the queries ahead of it. People should now think of this setting not only as the time your allowing queries to complete before being killed but also the additional duration of time your prepared to deny users access to the server.
  • The real kick in the pants comes along when you find out that there are scenarios where a query will not respond to the query cancel invoked by ForceCommitTimeout. The obvious one is when there is a bug but there are others. This means that the commit can’t kill the query and your server is effectively hung and the users are screaming. What’s worse is the SYSADMIN can’t connect to the server to diagnose the problem because the server lock blocks them!
  • I have seen connections error when connecting to the server due to the server level lock which is even worse. Unfortunately i have not managed to identify the repo (yet).

Read more…