SSAS Multithreaded sync with Windows 2008 R2
We have been happily running some of our systems on Wndows 2003 and have had an upgrade to W2K8 R2 on the list for quite some time. The upgrade has now completed and we can start taking advantage of some of the new features which is the reason for this post.
For a long time we have used the sample Robocopy script from the SQLCat team to synchronize some of our larger SSAS databases. If your wondering what i mean by large, around 5 TB with a good few thousand partitions.
The script works like a dream but at this scale you soon find that the copy process starts to chug when it hits the fact data as this fantastic multi threaded script now becomes single threaded again as the bulk of the data being transferred is fact data.
One way to improve the situation is to add more drives or mount points and spawn multiple copies of the script at the drive/mount point level. The biggest draw back with this approach though is the cost of managing the creation and maintenance associated with having partitions spread across multiple drives and the cost is significant enough that I kept the number down to a point that kept us within the maintenance/recovery windows.
With Windows 2008 R2 the copy script just got better because Windows 2008 R2 robocopy can leverage multi threading!
So, no longer is there a need to break out the drives for file copy throughput reasons (although there are still reasons to break them out). We now also have an easy way to maximise our network bandwidth utilisation (i`m sure our network guys may not be quite so happy )
So, how to use the new functionality. As is often the case with new functionality multi threading is off by default so you have to add the switch /MT which defaults to 5 threads. If you want more then its /MT:n where n = number of threads.
So just how much throughput did i get? Well i did a test copying data from a single drive across the network with 1.2TB of cube related data. Adding the /MT to the SQLCat script to spawn 5 threads I got the results below.
Single Threaded | Multi Threaded (5) | |
MB | 1245184 | 1245184 |
Throughput MB/Min | 1688 | 5367 |
Throughput MB/Sec | 28.13 | 89.45 |
Times Hr:Min | 12:30 | 03:54 |
That is a serious improvement in throughput and now it is very easy to fully utilise my servers network bandwidth! It is also very easy to swamp your network card or IO subsystem with the multi threaded switch so make sure you test what the optimal setting is for you.
If your wondering whether the multi threaded capabilities of robocopy can be used in earlier versions of Windows by copying the exe from a 2008 R2 box the answer is no . The reason is because the functionality is part of NTFS.
As always i hope you found this post informative and useful!
Why use robocopy rather than the built in SSAS sync functionality? With 2008 and up I was always under the impression that the sync was faster than robocopy.
thanks!
Hi,
I take it you mean SSAS rather than Windows 2008?
I have seen similar results where sync can be faster than robocopy and i blogged about that https://blog.calvett.co.uk/2010/04/08/ssas-synchronisation-performance/ and i believe this is down to internal optimisations. That blog was based on a system where the data is on one drive running W2K3.
As soon as you spread the data out across multiple drives you quickly out perform the SSAS sync as you can copy multiple sets simultaneously in W2K3. This gets even easier with multiple robocopy threads as stated in the blog. Your bottle neck then moves to network or disk io.
But that does not answer the why questions so here are the reasons i use robocopy.
• I have a set window to transfer 5TB across the network and ssas sync simply can not meet my SLA.
• If the robocopy is interrupted part way through i can pick up where i left off, with sync you have to start from scratch again.
The SQLCAT team also have a great document on the different ways to sync data http://sqlcat.com/sqlcat/b/technicalnotes/archive/2008/03/16/analysis-services-synchronization-best-practices.aspx
I will finish off by saying that on other systems i support i do use SSAS sync. Its all depends on the application and what i`m trying to achieve.
Hope that helps
One other point i did not add.
With SSAS sync, if your not syncing to an empty database, you require temporary space to store the files being sync’d but with robocopy you do not.
Hi Andrew – really enjoying your blog. Quick question, did you test with more than 5 threads? I think the limit is 128…
Hi William,
Not for this particular test. I had a 1gb network card on that server and with 5 threads we were close to the throughput limit. I have used higher numbers of threads for other scenarios but to date never needed to go above 20 as i would hit a bottleneck at that point.
What experiences have you had with the multithreaded option?
Never bothered with more than 5 (from the CAT script)…but I’ll do a few tests during off hours to see if/where transfer speed plateaus. Client has a pretty decent setup.
For traditional SSAS Query Server Scale-Out architecture…is there any reason why maxing it out (i.e. saturating the network path) would be an issue? Only thing I can think of is interference with inbound queries from the load balancer…
Never bothered with more than 5 (from the CAT script)…but I’ll do a few tests during off hours to see if/where transfer speed plateaus. Client has a pretty decent setup.
For traditional SSAS Query Server Scale-Out architecture…is there any reason why maxing it out (i.e. saturating the network path) would be an issue? Only thing I can think of is interference with inbound queries from the load balancer…
There are the standard items to consider
1) Network response time for other operations such as incoming queries.
2) Impact on network infra such as your switches, you may be taking someone else out… 🙂
3) If your going across data centre then WAN impact. Worth considering any network QOS policies
4) Impact on the IO subsystem especially if your using 10gb nics or infiniband.
makes sense – ty