Archive

Posts Tagged ‘Trouble Shooting’

Automating process monitor data collection

July 10, 2014 Leave a comment

Recently I needed to investigate why SSAS synchronisation commands were intermittently failing and the errors pointed to a file lock on a SSAS data file. To identify the file lock I needed to run process monitor for as long as it takes to capture the problem even if it takes weeks or months for it to occur. This presents the following problems

  • How to prevent the process monitor capture from consuming all available disk space.
  • How to ensure we do not over write the data we are looking for.
  • How to ensure the capture stops once we have the data we are looking for.

Read more…

Advertisements

Exposing the SSAS file system with the Metadata Sniffer

April 19, 2014 Leave a comment

Very recently I encountered a production issue and to assist me in the root cause analysis I needed information/clues that are not exposed by any SSAS interfaces so I created the metadata sniffer.

Read more…

SSAS Crash Dump behavior

January 17, 2012 Leave a comment

In SSAS 2005, when an exception occurs, you will always get a crash dump generated in the log directory of the instance and this could potentially lead to a lot of crash dumps consuming resources and a lot of work for the person who has to analyse the contents of each dump. One of the benefits of this approach is that if a user reported a serious error a quick check of the logs directory was a good indicator that there might be a problem.

In SSAS 2008 this behaviour changes and by default SSAS will generate one unique dump and then suppress all instances of duplicates until the service is restarted although the number of duplicates to be generated can be controlled by <MaxDuplicateDumps>

This is a good enhancement and it reduces the need to analyse a dump to confirm its a duplicate as well as introducing a mechanism to reduce the resource impact of dumps but its not perfect.

The biggest problem i have with this implementation is that there is no logging anywhere that a dump has been supressed.

In an ideal world there would only be one exception occurring in your system at any one time and monitoring would always catch that first exception but this is not an ideal world and IMHO there is no substitute for good logging.

I have raised a connect item to improve the crash dump behaviour and made the suggestions below. If you think the enhancements should be made then vote now

  • When duplicate occurs log to event log dump signature, location & filename of last known dump generated, total duplicates to date.
  • Introduce a mechanism to reset duplicate dump without restarting service.
  • If an exception leads to a service failure a dump should always be generated
  • Consider writing a stub to the log directory.
Categories: Analysis Services, Tips

Enter the SSAS server level lock……

September 23, 2009 7 comments

Ok, so your reaction to the title is probably the same as mine when i found out about SSAS server level locks! So, i will give you the scripts to reproduce the server level lock but first lets get down to business….  🙂

Server locks were introduced in one of the 2005 SP2 cumulative updates. At the moment i can say it was pre CU12. I`m not sure why it was introduced but it likely to be in response to a “feature” 🙂

Fortunately the lock only appears at the end of processing when SSAS commits its data and commits are usually quick so depending when you do your processing you might never see it. So why am i so horrified by the existence of this lock other than its simply wrong to prevent connections to the server? Below are my concerns….

  • If a query is running when processing comes to commit it must queue behind the query for a default of 30 seconds but processing still gets the server level lock granted meaning no one gets to connect for up to 30 seconds + commit time and users get connection errors!
  • ForceCommitTimeout is the setting that controls the duration a commit job waits to kill the queries ahead of it. People should now think of this setting not only as the time your allowing queries to complete before being killed but also the additional duration of time your prepared to deny users access to the server.
  • The real kick in the pants comes along when you find out that there are scenarios where a query will not respond to the query cancel invoked by ForceCommitTimeout. The obvious one is when there is a bug but there are others. This means that the commit can’t kill the query and your server is effectively hung and the users are screaming. What’s worse is the SYSADMIN can’t connect to the server to diagnose the problem because the server lock blocks them!
  • I have seen connections error when connecting to the server due to the server level lock which is even worse. Unfortunately i have not managed to identify the repo (yet).

Read more…