Extreme load and the “operation has been cancelled”

Home > Analysis Services, Tips, Trouble Shooting > Extreme load and the “operation has been cancelled”

Extreme load and the “operation has been cancelled”

September 18, 2011 Leave a comment Go to comments

There are many reasons why SSAS may generate an operation has been cancelled but i want to take you through a new, although rare, one that i recently discovered.

To set the scene I had been tuning the Process Index phase on one of our servers so we could fit a full reindex of a 1.8TB cube comfortably into a maintenance slot over the weekend. I will write another post about the actual performance and how we achieved it.

My tuning was going very well and we were pushing the box to its limits which is exactly what you want to do in a maintenance window Smile . I was running the index build server side and suddenly i received “The operation has been cancelled” message. I immediately went to look at my server trace and found that it had stopped recording at exactly the same time, how annoying.

With the current index build configuration every time i ran the build it failed with a cancelled error yet if i changed the configuration and eased of the throttle it would complete. This was unacceptable as i wanted every last drop out of the server. It was clear to me “something” was causing the cancel when i pushed the servers CPU’s to 100% so i dug out my friend process monitor and below is a screen shot of my “moment of discovery”.

So, what’s happening?

Above are details of the TCP communications occurring between SSAS, ASTrace (my tracing tool of choice) and SSMS.
The numbers in the paths are the ports being communicated on and i have highlighted this so you can see SSAS communicating between SSMS & ASTrace
Suddenly there is a block of TCP Retransmits and SSAS disconnects the clients 4 seconds after the first retransmit! This coincides precisely with my “Operation has been cancelled”.

What i also noticed was that when these retransmits occurred the processors had just gone from 99% to 100% so it seems that i may have been exposing a “weakness” in the tcp stack when under very heavy load.

My next step was to see if the retransmit is configurable, which it is. To increase the retransmit period you change the TCPMaxDataRetransmission value in the registry as documented here.

On our servers the retransmit was set slightly below the default value so I increased it to slightly above and my process indexes went through.

Below is the output from process monitor with the registry change in place. As you can see there are more retransmits but this time it does not end in disconnection.

Now, you may be concerned that this slows down the overall processing but in my case this patch of retransmissions happened once or twice during the processing so did not cause a material increase in processing time.

If you are experiencing the same symptoms and need to change the registry value you must also consider the impact of changing on clients and remember that i have only seen this when the server is under extreme load un a Windows 2003 SP2 box.

Finally, to make this discovery an even sweeter victory, once the index builds were going through i was able to shave another 10% off the run time. Open-mouthed smile

Categories: Analysis Services, Tips, Trouble Shooting

Comments (0) Trackbacks (0) Leave a comment Trackback

No comments yet.

No trackbacks yet.

Andrew Calvett's SQL Server Blog

Extreme load and the “operation has been cancelled”

Leave a comment Cancel reply

Categories

Andrew Calvett's SQL Server Blog

Extreme load and the “operation has been cancelled”

Rate this:

Share this:

Related

Leave a comment Cancel reply

Categories