Home > Analysis Services, Synchronisation, Utilities > SSAS Multithreaded sync with Windows 2008 R2

SSAS Multithreaded sync with Windows 2008 R2

We have been happily running some of our systems on Wndows 2003 and have had an upgrade to W2K8 R2 on the list for quite some time. The upgrade has now completed and we can start taking advantage of some of the new features which is the reason for this post.

For a long time we have used the sample Robocopy script from the SQLCat team to synchronize some of our larger SSAS databases. If your wondering what i mean by large, around 5 TB with a good few thousand partitions.

The script works like a dream but at this scale you soon find that the copy process starts to chug when it hits the fact data as this fantastic multi threaded script now becomes single threaded again as the bulk of the data being transferred is fact data.

One way to improve the situation is to add more drives or mount points and spawn multiple copies of the script at the drive/mount point level. The biggest draw back with this approach though is the cost of managing the creation and maintenance associated with having partitions spread across multiple drives and the cost is significant enough that I kept the number down to a point that kept us within the maintenance/recovery windows.

With Windows 2008 R2 the copy script just got better because Windows 2008 R2 robocopy can leverage multi threading!

So, no longer is there a need to break out the drives for file copy throughput reasons (although there are still reasons to break them out). We now also have an easy way to maximise our network bandwidth utilisation (i`m sure our network guys may not be quite so happy Smile )

So, how to use the new functionality. As is often the case with new functionality multi threading is off by default so you have to add the switch /MT which defaults to 5 threads. If you want more then its /MT:n where n = number of threads.

So just how much throughput did i get? Well i did a test copying data from a single drive across the network with 1.2TB of cube related data. Adding the /MT to the SQLCat script to spawn 5 threads I got the results below.

  Single Threaded Multi Threaded (5)
MB 1245184 1245184
Throughput MB/Min 1688 5367
Throughput MB/Sec 28.13 89.45
Times Hr:Min 12:30 03:54

That is a serious improvement in throughput and now it is very easy to fully utilise my servers network bandwidth! It is also very easy to swamp your network card or IO subsystem with the multi threaded switch so make sure you test what the optimal setting is for you.

If your wondering whether the multi threaded capabilities of robocopy can be used in earlier versions of Windows by copying the exe from a 2008 R2 box the answer is no Sad smile. The reason is because the functionality is part of NTFS.

As always i hope you found this post informative and useful!

Advertisements
  1. jesse
    August 6, 2012 at 18:24

    Why use robocopy rather than the built in SSAS sync functionality? With 2008 and up I was always under the impression that the sync was faster than robocopy.

    thanks!

    • August 6, 2012 at 19:47

      Hi,

      I take it you mean SSAS rather than Windows 2008?

      I have seen similar results where sync can be faster than robocopy and i blogged about that https://blog.calvett.co.uk/2010/04/08/ssas-synchronisation-performance/ and i believe this is down to internal optimisations. That blog was based on a system where the data is on one drive running W2K3.

      As soon as you spread the data out across multiple drives you quickly out perform the SSAS sync as you can copy multiple sets simultaneously in W2K3. This gets even easier with multiple robocopy threads as stated in the blog. Your bottle neck then moves to network or disk io.

      But that does not answer the why questions so here are the reasons i use robocopy.

      • I have a set window to transfer 5TB across the network and ssas sync simply can not meet my SLA.
      • If the robocopy is interrupted part way through i can pick up where i left off, with sync you have to start from scratch again.

      The SQLCAT team also have a great document on the different ways to sync data http://sqlcat.com/sqlcat/b/technicalnotes/archive/2008/03/16/analysis-services-synchronization-best-practices.aspx

      I will finish off by saying that on other systems i support i do use SSAS sync. Its all depends on the application and what i`m trying to achieve.

      Hope that helps

      • August 7, 2012 at 06:32

        One other point i did not add.

        With SSAS sync, if your not syncing to an empty database, you require temporary space to store the files being sync’d but with robocopy you do not.

  2. williamanton
    December 17, 2014 at 15:41

    Hi Andrew – really enjoying your blog. Quick question, did you test with more than 5 threads? I think the limit is 128…

    • December 18, 2014 at 11:12

      Hi William,

      Not for this particular test. I had a 1gb network card on that server and with 5 threads we were close to the throughput limit. I have used higher numbers of threads for other scenarios but to date never needed to go above 20 as i would hit a bottleneck at that point.

      What experiences have you had with the multithreaded option?

      • williamanton
        December 18, 2014 at 14:00

        Never bothered with more than 5 (from the CAT script)…but I’ll do a few tests during off hours to see if/where transfer speed plateaus. Client has a pretty decent setup.

        For traditional SSAS Query Server Scale-Out architecture…is there any reason why maxing it out (i.e. saturating the network path) would be an issue? Only thing I can think of is interference with inbound queries from the load balancer…

  3. williamanton
    December 18, 2014 at 14:00

    Never bothered with more than 5 (from the CAT script)…but I’ll do a few tests during off hours to see if/where transfer speed plateaus. Client has a pretty decent setup.

    For traditional SSAS Query Server Scale-Out architecture…is there any reason why maxing it out (i.e. saturating the network path) would be an issue? Only thing I can think of is interference with inbound queries from the load balancer…

    • December 18, 2014 at 14:16

      There are the standard items to consider

      1) Network response time for other operations such as incoming queries.
      2) Impact on network infra such as your switches, you may be taking someone else out… 🙂
      3) If your going across data centre then WAN impact. Worth considering any network QOS policies
      4) Impact on the IO subsystem especially if your using 10gb nics or infiniband.

      • williamanton
        December 18, 2014 at 14:58

        makes sense – ty

  1. October 9, 2012 at 15:21

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: