Upcoming event

Be-Delphi Delphi Developer Day

Be-Delphi is organizing their first (of many) Delphi Developer Day on November 17th in Edegem near Antwerp. That day will be completely dedicated to Delphi and Prism.

At Be-Delphi, Devia will be holding a talk about the new LiveBindings in Delphi XE2, so be sure to grab a hold of me and say hello !

Setting up a Virtual Cluster to speed up Compressor

written by Stefaan Lesage on 16/01/2009

In the previous article we had a look at two ways to get your Sequence from Final Cut Pro into Compressor and we had a look at the pro's and con's of each work-flow. We also discovered that using an intermediate QuickTime movie allowed us to use Final Cut Pro while Compressor was doing it's job. In this article we will have a look at how we can increase performance on the Compressor part of the work-flow by setting up a Virtual Cluster.

Introduction

image

Final Cut Studio Machine

In the previous article 'Exporting from FCP to Compressor, or use a QuickTime Movie instead ?', we had a look at two ways to get your Sequence from Final Cut Pro into Compressor and we had a look at the pro's and con's of each work-flow. We also discovered that using an intermediate QuickTime movie allowed us to use Final Cut Pro while Compressor was doing it's job. In this article we will have a look at how we can increase performance on the Compressor part of the work-flow by setting up a Virtual Cluster.

But in today's world our Mac's have multiple processors and / or cores. Our machines have a lot of Raw Processing Power, but I was under the impression that we wern't using the full power of our MacPro Octo.

In my previous tests, I was using an average of 35% CPU power across all 8 cores in my MacPro. There must be a way to use more of it, and the solution is to setup and use a QuickCluster or so called Virtual Cluster !

Setting up the Virtual Cluster

If you have installed Final Cut Studio, you have a whole package of tools and utilities that can assist you in your daily Video Editing and Compression Tasks. One of those utilities is called Apple QMaster. QMaster can be used to make a Compressor Cluster consisting of a combination of Shared Storages, XServers, Macs and fibre chanel which are all connected to a Network. QMaster will split and distribute a compression job for you. But even if you have just a single MacPro, you can still use QMaster to create a Virtual Render Cluster, which can significantly boost compression speeds.

Configuring QMaster

Open up the System Preferences and select the Apple QMaster option listed under Other. You should get a the QMaster Preferences pane which looks like this :

image

QMaster Preferences pane

In order to set up a Virtual Render Cluster, we have chosen the QuickCluster with services option under Share this computer as. Under Services, you should normally find 2 entries. One which is labeled Shake and other shell command processing, and the other one called Distributed processing for Compressor. Since we are looking on ways to speed up Compressor, we need to take a closer look at the Distributed processing for Compressor service. If you select that service, you will notice Selected Service on (1 Instance) next to the options button, and that is exactly where there is room for improvement.

This actually means that we have one instance of compressor which will be used to process the Compressor job. Since we have 2x4 cores in our MacPro we can safely increase the number of instances compressor will use. I have been experimenting a bit, and from what I read you should get best performance with 1 compressor instance for every 2 cores. So if you have a Quad Core, you can use 2 instances, if you have an Octo you can use 4 cores. It is also a general rule of thumb to have at least 1 Gb of memory for each Compressor instance. Depending on your system you can play with the number of instances. I discovered that on my machine, using 5 instances for compressor gave me the best performance. Compressor finishes the job a lot faster, while I'm still able to do some other things on my machine.

So, select Distributed processing for Compressor from the list of Services, check the Share check-box (no idea what the Managed on is for yet, but you can check that one too if you want). Next, click on the Options for selected service... button and enter the number of compressor instances you want to use.

image

Setting up 5 Compressor Instances

The only thing we now have to do is enter a name for our Quick Cluster or Virtual Cluster. Fill in the name under Identify this QuickCluster as:. On our Virtual Cluster we used MPCluster as the name. Once everything is set up correctly, click the Start Sharing button. There you go, you have a Virtual Cluster ready.

Using the Virtual Cluster

If we have a look at our previous article, we know we had two possible work-flows. We could export from FCP to Compressor, or we could Export a QT Movie and bring that into compressor. Sadly using a Virtual Cluster, you can only use the last technique, which means you will have to export a QT Movie first. And here it really depends which QT Movie you use as the source in order to get most our of the Virtual Cluster.

Lets try the same approach as we used before. We exported a 3 minutes 41 seconds HDV1080i50 clip to a Self Contained QT Movie. Now we can bring that into compressor, add the two targets, and set the destination.

image

Submit the Compressor Job to our QuickCluster / Virtual Cluster

Clicking on the Submit button brings up a new window, and you will notice that you can now choose your Virtual Cluster from the list of clusters. You should now have 2 entries there This Computer and in our setup also MPCluster which is our Virtual Render Cluster. Choose your Virtual Cluster and hit Submit.

Hitting Submit again will now send the job to compressor, and that will try to use the available Compressor instances to finish it's task. In our case the job had 2 targets and each Target got split into 2 segments (one for audio and one for video), this is (probably) due to the fact that our Sample was just too short to split it up even more.

Time needed to finish the job

image

Time needed

Compressor took 9 minutes and 50 seconds to finish off the task, which is about 5 minutes faster than our test without the Virtual Cluster. It still only used about 65% average CPU power on all 8 cores, but again, that might be because the compression job was just too small.

It is indeed a good start, but the technique really starts to shine if you add more targets or have bigger compression jobs. Our current job was just too small for Compressor to split it into enough segments which it can process simultaneously. I needed a bigger sample in order to demonstrate it.

Just to demonstrate that I did another test run.

Some more test results

This time I wanted a bigger Compression job so I created a new Sequence and nested my old sample sequence into it 5 times in a row. This gave me a Sequence with the same settings as before, but which is now 5 times the duration for a total op 18 minutes and 26 seconds.

I went ahead and exported the sequence into a QT Movie. Brought it into compressor and submitted the job without using the Virtual Cluster. It took Compressor 1 hour 33 minutes 44 seconds to complete the task.

The next test was to see how long it would take to process the same Job, but this time using our Virtual Cluster. Soon after submitting the job to the Virtual Cluster, the MacPro split each target into 10 different segments, and started processing 5 segments at the same time. Processor power on all 8 cores averaged between 95 and 100%, but the machine wasn't getting overloaded. Compressor finished the job in 49 minutes 3 seconds.

For those interested, I've included a little table with the results of all 4 Tests :

Table 1: Time needed by compressor to finish a Job with and without the use of a Virtual Cluster
Source & Target This Computer Virtual Cluster Difference
HDV1080i50 clip (3 min 41 sec) to Apple TV and iPod Preset 14 minutes 40 seconds 9 minutes 50 seconds approx. 5 minutes
HDV1080i50 clip (18 min 26 sec) to Apple TV and iPod Preset 1 hour 33 minutes 44 seconds 49 minutes 3 seconds 44 minutes 41 seconds
Apple ProRes 422 720p25 (20 min 59 sec) to Apple TV and iPod Preset 32 minutes 59 seconds 16 minutes 52 seconds 16 minutes 7 seconds

Note

When OSX 10.5 came out, I had serious issues setting up my Virtual Cluster and actually using it. From time to time, it would just stop working, or my Virtual Cluster wouldn't show up anymore. I tried a lot of things, including reinstalling Final Cut Studio, even reinstalling my MacPro completely, but still my Virtual Cluster would only work 'occasionally'. Finally I found a post on the web suggesting to turn of 'Back to my Mac' in the System Preferences. This problem might be fixed by now, but I actually never turned Back to my Mac on again. So if you are having issues where your Virtual Cluster doesn't show up in Compressor once you started sharing it, you might want to try by turning off 'Back to my Mac'.

Comments

  • 1

    Really good information with nice tips along with screen shots too. Thanks for sharing valuable information.

    written by Apple iPods on 13/03/2009
  • 2

    Well, thank you a lot to share this! I will try to speedup my compressor :)

    P.S.
    Is it just me or there is a mistake at the end ?

    In “Some More Result”, at the end, it is written that “Compressor finished the job in 1 hour 33 minutes 44 seconds”
    ... well that is the same time that took without the virtual cluster!

    and after that you write in the comparison:
    HDV1080i50 clip (18 min 26 sec) to Apple TV and iPod Preset   1 hour 33 minutes 44 seconds   49 minutes 3 seconds

    written by Ben on 02/04/2009
  • 3

    Hi Ben,

    I hope this information will help you when setting up your Virtual Cluster.  If not, feel free to ask any questions you have and I’ll try to help you out.

    Oh, and thank you for pointing out the mistake in the post.  You were right, I used the wrong statistics in the Text, the correct ones are in the Table though.  That’s what happens when you use Copy / Paste once too often :-)

    Thanks for pointing that out, I’ve update the post to correct the mistake.

    Regards,

    Stefaan

    written by Stefaan Lesage on 02/04/2009
  • 4

    I’m looking at the Apple Qmaster page as I write this.  The “Options for selected service. . .” button is grayed out.  The “Selected Service Off (4 instances) changed to 4 (from 1) when I checked Share in the Services box for Rendering only.  The name is my computer is present in the QuickCluster / Identify this QuickCluster as: box, but it is also grayed out.  I have a Dual-core Mac Pro (4 processors total @ 2.67 each) with 6 gigs of memory currently.  My Activity Monitor shows one pane currently.  Any thoughts on why I can’t create 2 instances of sharing rendering would be appreciated.  Thank you for taking the time to help out with this situation.  Marius

    written by Marius on 12/04/2009
  • 5

    Hi Marius,

    If the Service is already shared, you won’t be able to change the settings.  The button on the preferences pane will display ‘Stop Sharing’.  Click that button to stop the service, and now you should be able to change the settings for each individual service.

    If you already have set up 4 instances, and shared the service, you should normally see 4 CompressorTranscoderX tasks in the Activity Monitor.

    You should also be able to select your Quick Cluster from the dropdown when submitting your compressor job.

    If it doesn’t work yet there are a few things you could try :

    In Compressor, go to the Compressor Menu and select ‘Reset Background Processing’ from the menu.

    Try to turn of ‘Back to my Mac’.  I know I couldn’t get it working when Back to my Mac was turned on, but this could have been resolved by now.

    Let me know if this solved your problem.


    Regards,


    Stefaan

    written by Stefaan Lesage on 13/04/2009
  • 6

    Awesome. Have been looking for something like this for ages.
    Will do more tests at the end of the week and share my results if you like

    written by scrimski on 27/05/2009
  • 7

    Hi,

    Sure, let us know if what the results are, and maybe we can add it to my own test results.

    Regards,


    Stefaan

    written by Stefaan Lesage on 27/05/2009
  • 8

    Hi, good post. I have been wondering about this issue,so thanks for posting.

    written by AndrewBoldman on 04/06/2009
  • 9

    I really like your post. Does it copyright protected?

    written by Kelly Brown on 12/06/2009
  • 10

    Hi,

    There is no copyright on this post, feel free to repost it were needed.  It would be nice though if you could link back to this page when reposting the information.

    Regards,


    Stefaan

    written by Stefaan Lesage on 16/06/2009
  • 11

    Really great post.  Thanks!  3 instances on my quad core 4Gb is working really well.

    written by Ehec on 19/06/2009
  • 12

    Hi Ehec,

    Glad you liked it.  Personally I have a Dual Quad Core MacPro with 8Gb ram and getting quite good performance with 5 compressor instances.

    I could probably add one or two more, but this still leaves all the power I need to do some other stuff while my video’s are encoding.


    Regards,

    Stefaan

    written by Stefaan Lesage on 19/06/2009
  • 13

    Back again, was busy a while and couldn’t test my machine, but done now.

    Here we go:
    Model-IDMacPro1,1
    Type of CPU:  Dual-Core Intel Xeon
    CPU Speed   2.66 GHz
    CPUs   2
    Cores   4
    L2-Cache (per CPU):  4 MB
    RAM   5 GB
    Bus   1.33 GHz

    Compressor 3

    Test 1 was a 2K shot, around seven seconds in length, converting from uncompressed QT to DVCProHD 1080p25. It’s a typical task when onlining your edit from shots delivered by compositors to a playout for FCP, usually you have dozens, sometimes hundreds of shot like this.

    No cluster used
    0:00:40
    24% CPU

    1 instance
    0:00:39
    30% CPU

    2 instances
    0:00:50
    40% CPU

    3 instances
    0:00:54
    55%CPU

    4 instances
    0:00:55
    62% CPU

    Test 2 as a conversion of a full length (1h 27 min)feature I’m currently working on for subtitling and sound design purposes, going down from DV-PAL to 400*300 H264.

    no cluster used
    1:07:03

    1 instance
    1:03:42

    2 instances
    0:48:39

    3 instances
    0:41:27

    4 instances
    0:44:41

    Using clusters and instances seems to pay off when doing lager compressor jobs. I was a bit disapointed when I found out that I can’t use a cluster to convert Red footage since compressor can’t find any video file. Strangely enough the used quicktime container tends to disappear in the finder window but is visible again after reboot.

    written by scrimski on 19/06/2009
  • 14

    Hi,

    Thanks for getting back and posting your results here.  As you have noticed the performance gain isn’t really applicable on small files.  What compressor does is divide a task into smaller chuncks of work and send that to each compressor instance.

    I guess the 7 seconds file was just to small and couldn’t be split up into smaller chuncks, so you don’t really gain any performance from that.

    From my personal experience using multiple instances starts to pay off with large files as you mentioned, but also when processing multiple small files at the same time.

    If for instance you had to convert your 7 second clip to DVCHDPro, to h.264 for Apple TV and H.264 for iPod, you would have seen a performance boost as well.  In Compressor you can set up a batch which contains multiple Jobs or Files, and each of those can be compressed to their own settings.

    If you have such a situation with your 7 second clip, you will see that Compressor will send the 3 task to an individual instance of compressor and it will process them simultaneously.

    You might give it a shot using 5 instances of compressor to see if that improves performance even further.  I have currentle set up compressor to use 5 instances on my machine, and it works great.  From what I can see, you configuration is quite similar to mine (except I have a little more RAM), so it might be worth trying it out.

    Thanks again for getting back to use and posting your test results.

    Best regards,

     

    Stefaan

    written by Stefaan Lesage on 20/06/2009
  • 15

    On the new Mac Pro’s (Nehalem) hardware, Compressor sees 16 instances. Pretty cool.

    This means that the Mac OS 10.5.7 build on theses Mac Pro’s has the necessary Intel hyper-threading code, it must.

    (2 threads per core)

    8 cores, but Mac OS X sees them a 16 cores.

    Doing some tests, will let you know results

    written by macguitarman on 02/07/2009
  • 16

    Hey, now that’s interesting.  Personally I don’t have one of those new Nehalem Mac Pro’s, but it does sound a bit more interesting now.

    If you can find the time to do some tests, could you report back to us with some more information ? I would love to know if it does indeed inprove performance as well.


    Regards,


    Stefaan

    written by Stefaan Lesage on 05/07/2009
  • 17

    great thanks for the article - now get we get another article about rendering over a network cluster!

    written by chris torella on 23/09/2009
  • 18

    This looks like great info, coupled with your last article.

    I am sick of bashing my head against a wall trying to render a 25min HD video straight from FCP to compressor, the process is taking around 4 - 5 hours and thats on a octo 2.8 with 10GB memory.

    I will post back with how long it takes but I hope its a LOT quicker than a straight export.

    Cheers for the info

    written by Martyn on 30/10/2009
  • 19

    Further to my last note, I exported a quicktime reference file and down sampled from 1080P to 720P, this took just under 2 hours, compressor then took just under 20 minutes. his saves me between 1.2 and 2 hours!

    I am trying with a slightly different file and exporting at full res out of FCP and down sampling in compressor.

    On my octo 2.8 Mac, 8 processes in compressor was only taking between 30-40% cpu. I will try and up it to 16 and see what happens.

    written by Martyn on 30/10/2009
  • 20

    Hi Martyn,

    Exporting to a QuickTime movie first and then pulling that into compressor should be faster than your initial workflow.  If you can now get a Virtual Cluster running which will use all your cores to the maximum you’ll even see a bigger increase in performance.

    If you have set up your Virtual Cluster, make sure you submit your jobs to that virtual cluster (not to This Computer).  Also check that ‘Allow Job Segmentation’ is checked on the encoder tab of the setting you are using.

    If you did everything correctly, you should see it in the Batch Monitor.  Your job will get split into several smaller segments which will then be processed by the different instances of compressor.

    Regards,


    Stefaan

    written by Stefaan Lesage on 30/10/2009
  • 21

    Thanks, just went from a 13:19min encode to a 7:51min encode, 6min 1080XDCAM to HQ DVD.

    Tony

    written by Tony Gay on 18/02/2010
  • 22

    Hi Tony,

    Glad you found this useful. It’s always nice to see you can cut the required render time in half :-)

    Regards,


    Stefaan

    written by Stefaan Lesage on 20/02/2010
  • 23

    I haven’t been able to get QMaster to work without crashing yet—but I’m wondering about the export-to-quicktime step.  I find that exporting a quicktime movie can often be quite time consuming—and, while outputting, you can’t use FC.

    I think in the grid above regarding time savings you should add the step where you output to quicktime.  If, after all, using compressor without a cluster takes an hour, and using compressor with a cluster takes 30 minutes but requires a 20 minute render of a quicktime file, the results aren’t as interesting.

    Please let me know, thanks!

    written by Nick on 22/03/2010
  • 24

    Hi Nick,

    In the sample data provided, it doens’t really matter at all, since in both cases the file was first exported to a QT Movie anyway.  So in our example we first exported the QT Movie, then used that movie to render with or without a Vritual Cluster.

    I havn’t tried it yet, but I think the latest version of Final Cut allows you to export a QT Movie in the background, so you can still keep on working in FCP.

    I’ll try to get some more data though, but I’m pretty sure that exporting directly from FCP takes even longer since it has to send the file over from compressor on a frame by frame basis.


    Regards,


    Stefaan

    written by Stefaan Lesage on 23/03/2010
  • 25

    I think Nick makes a fair point. I am trying to speed up my workflow that requires me to export 10 mins of uncompressed animation codec 1080p to 960x540 QT for review. Exporting a QT movie via Quicktime conversion takes the same amount of time as directly exporting an uncompressed QT and that is before sending to compressor. (approx 30 mins)
    So I can see no advantage in using compressor.
    Is there a way of speeding up this process?

    written by Geoffb on 24/03/2010
  • 26

    Hi Geoffb,

    I think I’m missing something.  In your case you should export your uncompressed animation 1080p sequence as a QT Movie using those settings.

    The whole point is to let compressor do the conversion from the uncompressed animation codec into the 960x540 format.

    So try it out.  Export your sequence using the same settings as the sequence itself.  This should be pretty fast, since there is no conversion which needs to be done.  Next step is to bring that uncompressed file into compressor and let compressor do the necessary format changes.

    Regards,

    Stefaan

    written by Stefaan Lesage on 24/03/2010
  • 27

    Thanks Stefan.
    I’m afraid I wasn’t very clear with my explanation.
    I am exporting native 1080p at sequence settings first and then transcoding in compressor.
    Unfortunately the export as 1080p takes about 25-30 mins and then another 25 mins in compressor.
    If I export directly to 960x540 through QT conversion it takes about the same time as the 1080 export so FCP is still out of use for that time and only one process gives the desired result.

    written by Geoffb on 24/03/2010
  • 28

    Hi Geoff,

    Hm, strange indeed.  So in the end, the whole process takes just as long with Compressor as exporting it from FCP in the format you need ?

    In that case, the only difference you have is that after the initial 25-30 minutes of exporting you can start using FCP again, while compressor is crunching on it, while in the other case you would have to wait 50 minutes to use FCP.

    Did you actually try setting up a Virtual Cluster with more than one Compressor instance ? It might speed up the compression in compressor a lot.

    Regards,


    Stefaan

    written by Stefaan Lesage on 24/03/2010
  • Commenting is not available in this weblog entry.

    Archive