Comparing the performance of v2 and v3

brad · October 20, 2015, 10:43pm

I’ve had a couple of users report that performance of v3 seems to be slightly worse compared to v2.

Cantabile 3’s audio engine is faster than v2’s. More importantly it’s lock free which dramatically reduces the chances interruptions to audio processing - that’s what I mean by more stable.

Also, I haven’t really performance tuned things for quite a while now and I know racks are working harder than they need to be. Performance tuning will be part of the “fine tuning” over the next month or so.

Bear with me… it should get better.

dave_dore · October 23, 2015, 10:18pm

+ = V3

couldn’t resist

brad · October 23, 2015, 11:10pm

, although in reality it’s often more like this:

http://imgur.com/JqylNr2

brad · October 27, 2015, 1:35pm

For the last few days I’ve been doing a deep dive on v3’s performance.

Although it’s pretty late in the game and this will delay things a little, I’ve decided to change the execution model Cantabile uses to dispatch work items across multiple cores. The problem is always getting the size of the work items right - big enough that it’s worth dispatching to another thread, but small enough that there’s enough work to keep each core busy.

In build 3097 the work items were very small, but the execution engine coalesced these small items together to make larger batches of work that could be executed together. This worked reasonably well, but there was a fair bit of overhead during each audio cycle to work out the execution plan.

In the new execution model, responsibility for these groupings has been moved from the execution engine to the higher level application objects. This gives two advantages - the execution groups are more controlled and much of the execution planning can be done up front rather than on every cycle.

It’s not completely working yet (haven’t really tested racks at all), but plugin processing is looking promising. I’ve been testing with 64 instances of simple vsti, with no notes sounding (since I’m interested in load in Cantabile, not the plugins).

Here’s how the timing looked in 3097:

00078386 [3940:2]: Parallel Thread: exec:2648861 wakes:9029 soft:2 hard:2615 locks:55610 lockrate:6.2
00078387 [6176:2]: Parallel Thread: exec:6054086 wakes:10326 soft:0 hard:9028 locks:658256 lockrate:63.7
00078387 [6176:2]: Parallel executor (execute): 10326 hits, avg: 1103 ticks (0.403ms), max: 4927 (1.802ms)
00078387 [6176:2]:    avg: 0 ticks (0.000ms) max: 2 ticks (0.001ms) - Lock taken
00078387 [6176:2]:    avg: 479 ticks (0.175ms) max: 1239 ticks (0.453ms) - Prepared
00078387 [6176:2]:    avg: 1102 ticks (0.403ms) max: 4927 ticks (1.802ms) - Tasks executed

Here’s the new execution model:

00070726 [3728:2]: Parallel Thread: exec:324315 wakes:6959 soft:0 hard:3507 locks:338288 lockrate:48.6
00070727 [5336:2]: Parallel Thread: exec:387199 wakes:10502 soft:0 hard:6958 locks:401115 lockrate:38.2
00070727 [5336:2]: Parallel executor (execute): 10502 hits, avg: 786 ticks (0.288ms), max: 1866 (0.682ms)
00070727 [5336:2]:    avg: 0 ticks (0.000ms) max: 29 ticks (0.011ms) - Lock taken
00070727 [5336:2]:    avg: 90 ticks (0.033ms) max: 803 ticks (0.294ms) - Prepared
00070727 [5336:2]:    avg: 786 ticks (0.287ms) max: 1865 ticks (0.682ms) - Tasks executed

Breaking this down:

Comparing at the first two lines in each test you can see the load is much more evenly distributed across the two threads. In 3097 one thread is doing more than double the work of the first.
Still on those two lines, you can see the total number of executed items dropped from about 8.5 million to about 700 thousand, reflecting the larger size of each unit of work.
Comparing the third line, the average execution time has dropped from 0.4ms to about 0.29ms. (25% improvement)
Still on the third line, the maximum execution time has dropped from 1.8ms to less than 0.8ms - more than twice as fast.
On the fifth line the average time to prepare the execution has dropped from 0.175ms to 0.033ms - this represents the dramatic reduction in work required to prepare the execution plan because much of it is now pre-planned.

So in practice what does this look like on the load meter?

In 3097 the load varied from about 15 to 30%, occasionally dropping to 7% and occasionally spiking as high as 70%.
In the new build it’s much more stable and sits around the 8% mark (+/- maybe 2%).

I call that a win! Well worth crazy number of code changes, but now I need to stabilize it.

Having said all that, v2 runs the same 64 plugins at about 5% - but the only reason it can do that is because of the much simpler routing capabilities. v3 needs more audio mixers per plugin - but even so, it works out less than 0.05% load per plugin for the extra capabilities - probably worth it. Either way, I’ll take a stab at trimming that down too.

Neil_Durant · October 27, 2015, 3:11pm

Sounds excellent!!! Would you get the same kinds of benefits when you have a much less balanced set of plugins, in terms of processing, channel count etc? Or is this entirely a matter of number of work items/cores?

brad · October 27, 2015, 10:31pm

Hi Neil,

How well balanced the load ends up really is a factor of the combination and connection of plugins you’re using. If you use one heavy and one light plugin then obviously the load will be skewed. If you’re using a large number of parallel plugins the load should be fairly evenly distributed. If you’re using a chains of connected plugins they won’t balance (since there’s no point), but if you have a couple of different chains in parallel, they should.

Basically it works in such a way that as soon as it finds two or more groups that can be executed in parallel, the first is retained by the current thread while the others are put in a ready list and picked up by the next available other worker.

Brad

brad · October 31, 2015, 7:14am

Just a quick update: I’ve spent the last few days just testing and stabilizing this new execution model and I can now run it fairly reliably and it’s functionally complete.

It turned into a bigger set of changes than I originally anticipated though:

dave_dore · October 31, 2015, 10:55am

More like a substantial engine redesign than a performance tune-up! I’m anxious to test drive it. Thanks for your efforts!

brad · October 31, 2015, 12:16pm

I wouldn’t go that far. All the audio and MIDI processing paths are still the same - it’s just the way the execution of each node is invoked. Still, a significant change.

brad · November 3, 2015, 2:43am

This is available now in 3098. Note these improvements won’t improve the performance of plugins - it just trims down the overhead introduced by Cantabile when having very complex execution graphs (ie: many plugins/racks)

(btw: if you’re interested peeking into what’s going on under the covers, try the Tools -> Debug -> Dump Execution Plan command).

aleph75 · November 13, 2015, 12:39am

Hello, for my tired eyes, the new interface is so much comfortable!

brad · November 13, 2015, 3:29am

Thanks Pablo, I’ve spent a lot of time on it so I’m glad you like it.

NormB · December 5, 2015, 7:22pm

I really appreciate the current level of stability I am experiencing with Build 3104. I have been tuning and trying out a range of software pianos and currently have FOUR VSTs loading at startup. This includes two big sampled pianos. State switching has been fast and, for me, flawless.

Alexander · September 10, 2016, 6:31am

Hi folks!

Is there anybody tested V3 vs V2 performance recently ?

kind regards, Alexandr

brad · September 12, 2016, 1:48am

Hi @Alexander

Sounds like you want an independent answer from someone other than myself, but I’ll give you my opinion on it:

The load meter in Cantabile generally sits a little higher than in v2. This is because v3’s audio pipeline is more complicated in order to handle the new routing capabilities.
Cantabile 3’s audio engine is completely “lock free”. This means that it’s less susceptible to glitches than v2. Not that v2’s engine is bad, it’s just a different approach that’s more like to get caught.
In other words, v3’s load meter generally sits a little higher but also shouldn’t fluctuate as much as v2.

Another way of looking at this: if you’re running a setup where every 1% percent of load matters then you’re probably running too close to the edge anyway.

Interested to hear other’s opinion on v3’s performance too…

Brad

Neil_Durant · September 12, 2016, 11:52am

My qualitative observations are roughly in line with what Brad says. I my experience over a wide range of songs of different complexity, V3 seems to run slightly higher (a few percent), but generally remains very stable. With V2 I saw more variation and peaking. I personally prefer the behaviour of V3 because despite the slight increase, it feels more dependable. Although none of my songs take it over about 40% (even with a dozen or more plugins running), I’d be fairly happy to run it at 80-90% live if I had to, whereas I started to get cautious with V2 once it reached about 60% because of the occasional peaks.

Neil

Corky · September 12, 2016, 4:42pm

And…I agree with @Neil_Durant. I wanted so much for V2 to work for me, but the stability wasn’t there. It would crash on many of my prime plugs. I used other programs, and finally settled on Live Professor. Though it was a slow load, it was there when I needed it. But, if I tried to complicate the setup past two plugins, it would get a little unreliable, and would require a reboot. Since I play live, this was unacceptable. I searched for a long time and tried many different VST hosts and wound up using a some DAWs , but most of them were slow and added an additional load. I finally went to Ableton, using information from the many church musicians on Youtube who used it for Sunday morning services for a handfull of songs, and lighting, etc. It worked very well except for the fact that a song setup and related switching was very tedious. Since I have nearly 500 songs I have to load between several groups I perform with, I do not have the time to dedicate to song “surgery”. Then…I saw where Brad created V3 with a new engine. Hmmmmm?! I tried it. Once I got past the initial learning curve, I was elated! The stability was amazing, the setup was easy, and all my plugins worked. Then…when I discovered the power of this new version, and the possibilities under the hood, I became a V3 junkie and my performances changed from gear fumbling to actually interacting with the band and audience. This past weekend V3 worked flawlessly, and I received many compliments mainly because I was able to spend my time on stage performing, and my breaks in the crowd visiting, instead of correcting problems. Yeah, I’ll take V3 over V2 or anything else I’ve tried!

Corky · September 12, 2016, 4:46pm

@brad ( now about that endorsement deal I discussed…lol)

Alexander · September 12, 2016, 7:01pm

Thank you guys!

And finally a 2 little tests from myself.
The results are… audible with the naked ear

8 “Omnisphere 2” instances with “High church - Use Live Mode” preset

V2 - big dropouts and glitches.
V3 - almost no noticeable effects

3 “Audjoo Helix” instrument instances with “Burn My CPU” preset
are
V2 - resulting dropouts
V3 - working ok

kind regards, Alexandr