Multi-core processing question

CantabileUser · October 14, 2015, 10:54am

Within Cantabile, I have lots of Kontakt VST instruments, many of them quite large.

Kontakt has an option to choose the number of cores, or to disable mulicore processing. They seem to imply that it would be best to disable muticore, IF the host is utilizing multicore processing, as there would be a conflict and that could disrupt the audio.

So, does Cantabile use multicore processing, and if so should I disable theirs?

brad · October 14, 2015, 11:18am

This is an excellent question! The answer is… is depends.

It depends on how many cores you have, how many plugin instances you have loaded, whether the plugins are routed to each other and more. The idea is to try and match the number of physical processor cores.

By default Cantabile assumes plugins don’t use multi-core processing and sets it’s multi-core thread count to match the number of physical processors. So if you have multi-core plugins you should either disable the multi-core in the plugins, or reduce the number of threads Cantabile uses.

The question is - which one to disable/reduce. The basic idea is this:

If you’re running one plugin that supports multi-core and nothing else then disable Cantabile’s multi-core and enable the plugin’s multi-core. The rationale here is that there’s nothing else for Cantabile to use those extra cores for if there’s no other plugins - so might as well give them up for the plugin.
If you’re running many plugins in parallel (ie: not routed to each other) then enable Cantabile’s multi-core support and disable the plugin. The rationale being that Cantabile will be able to keep all the cores busy working on different plugins.
If you have less plugins than cores, and you’re using multi-core plugins you could split the cores - half for Cantabile, half for the plugin.

Also note that you don’t want to use many multi-core enabled plugins - as they too will end up competing for access to the cores.

So here’s my general advice:

Start by disabling multi-core in your plugins
Enable Cantabile’s multi-core support
If you find a multi-core capable plugin is processing too slowly, adjust to give an extra core or two to that plugin.

Hyper-threading

If your processor is hyper-threading capable (eg: i7) you need to be very careful not to exceed the number of physical cores.

In a hyper-threaded CPU, you get double the number of virtual cores as physical cores. The CPU can execute multiple instructions on one core simultaneously by using different parts of the core at the same time.

Unfortunately for audio processing most threads typically want to use the same part of the CPU - namely the floating point math processing capabilities and the end result is heavy contention and a big performance drop.

For hyper-threaded machines I recommend disabling multi-core processing in all plugins and set Cantabile’s setting to automatic/enabled. For dedicated audio performance machines I’d even recommend disabling hyper-threading entirely.

Cantabile multi-core settings

In Cantabile 3, in Options → Audio Engine you can select how many cores to use:

In Cantabile 2, all you can do is disable multi-core processing:

Neil_Durant · October 14, 2015, 9:52pm

Brilliant reply!! One quick question:

Could you explain the reasoning behind this? I notice Cantabile shows the number of physical cores in automatic mode for “Number of Audio Threads”, which is great. Is it that there’s a chance two audio threads could be scheduled on two virtual cores of the same physical core, where we’d prefer them to be on two physical cores?

Neil

brad · October 14, 2015, 10:29pm

The reason is more to do with the fact that:

a) it’s questionable how much additional performance benefit hyper-threading gives in the first place.
b) given the negative performance impact of multiple threads contending for the same processor core resources (ie: the math unit) its a risk you want to avoid.
c) given that plugins are now utilizing multi-core, it becomes very difficult to establish just how many threads will be running during the audio cycle - and if you happen to exceed the number of physical cores, risk of a performance drop is pretty real.

By turning off hyper-threading you avoid that risk.

Theoretically yes. The mysteries of Windows thread scheduling are, well, a mystery and subject to change with OS versions. That said I’ve found that limiting the number of threads to the number of physical cores typically runs fine. Exceeding the number of physical cores quickly causes problems.

Torsten · April 6, 2017, 12:19pm

Hi Brad,

just trying to clarify: does this mean that if I run all my instrument plugins through a final volume and limiter plugin, they will all be processed on one core? This means that my volume plugins create quite a bit of a bottleneck - and my quad-core processor is pretty much a waste…

would it be better to create multiple plugin paths to a single output (i.e. multiple volume and limiter plugins in parallel) and - or is the output another bottleneck?

Looks like I need to re-think my song structure for maximum processor load balancing…

Cheers,

Torsten

terrybritton · April 7, 2017, 12:35am

It is my understanding that Windows process manager splits up your VST’s to separate cores in a round-robin manner – when core number 4 is used, it circles back to put the next VST into core 1. It is not mixing all the digital processing into the final VST. It is sort of a “virtual parallelism”. (In real life, you usually only allow Cantabile to use 3 of your 4 cores so that one is always available to the system.)

So, the digital realm runs in multiple cores in a somewhat parallel fashion. They are all “mixed” to that final VST effects plugin (or plugins) which run in their multiple cores if you have several.

I hope that is clear what I attempted to say there!

Terry

Torsten · April 7, 2017, 12:51am

Hmmm, what worries me is Brad’s statement in the guides

The main thing to understand here is that multiple core’s don’t help when process a task whose input depends on the output of a previous task.

For example, say you have two plugins - an instrument, followed by an effect. There’s no point processing the effect on a separate core because it can’t start it’s work until the instrument has been processed. In fact, in this case the additional cost of switching between threads would make the process slower.

On the other hand, if you have two instruments they can be processed on separate threads since neither needs the output of the other for its input. If they’re processed on separate threads, the operating system can schedule these to different CPU cores and they can both be processed at the same time.

I read this as “all plugins that are connected in sequence need to be processed in one thread”.

Cheers,

Torsten

terrybritton · April 7, 2017, 2:13am

Well, you really do not have control over which processes run in what cores, but I see your point. What you are alluding to is that it would be better to have each instrument fed to its own independent effect rather then summing them all into a single one. But I’d watch your process monitor to really see what happens when. It may process many VST’s in separate cores, dump their audio into a buffer via a separate thread, mixing on-the-fly, and then feed that buffer into the effect for all we know.

I love ProcessExplorer from TechNet. They just updated it in February of 2017. The download/installer includes both the 32 and 64 bit versions. I always place it into my C:\ root. If you click the first of the graphics at the top, it has a checkbox at the bottom of the window that appears which allows you to view a separate graph for each core.

Process Explorer is really an indispensable tool for me when observing what is using resources up.

Of course, the Windows 10 “Resource Monitor” is useful also (reached from the Task Manager’s Performance Window or type “resource monitor” from the Windows button), but Process Explorer fills in several gaps there for me.

Ya just never know about these things till you look at the meters sometimes!

Terry

papwalker · April 7, 2017, 3:40am

Brad is correct. The operations are serial - must be. There may be certain GUI threading or stereo effects per library that is useful, but by and large parallel processing would be awash with race conditions and non deterministic bugs.
Hence the quest for increased clock speed and pipe-lining by CPU manufactures for so many years.
You have touched on a serious academic issue plaguing Computer Science and Systems Logic since Babbage.

terrybritton · April 7, 2017, 8:24pm

Well, yes, they are serial in that sense, but all kinds of tricks can be done with the stacks and other buffers to make a pseudo-parallelism occur. It all depends on how the VST’s and the host are programmed to handle these matters. What happens in the mixer isn’t even entirely serial, with many effects performing up-sampling first before processing, then downsampling to return the results to the mixer. That stuff often can happen in different cores I should think, especially the stuff handed off to the floating point calculator (or the GPU in rare cases, though that’s becoming more common lately).

But yes, true parallelism is our quest!

Terry

papwalker · April 7, 2017, 10:01pm

Anything with multiple unrelated inputs can be parallel - such as the mixer. But chaining vst libraries is like a manufacturing process you can’t polish the item before it’s heat treated and you can’t heat treat it before it’s profile turned and you can’t turn it before it’s cast etc.
Henry Ford attacked the problem by putting one core per operation so to speak. Production line (pipe-lineing)
I’ve battled this issue for twenty years with relational database systems and the boffins have not yet come up with a practical solution except more speed.
Considering Georg Cantor, Kurt Godel and Alan Turing’s work it is likely not possible to parallel non deterministic operations and perhaps the question itself is unanswerable.
Maybe some spooky quantum effect might work but I think it’s a long way off.

Torsten · April 8, 2017, 12:11am

Hmm, before getting into Gödel’s incompleteness theorem and Turing’s theory of computability (reminds me of my computer science studies some 30 years ago…), just one concrete question to @brad before I dive into re-arranging my song files:

Are Cantabile’s output ports / port buffers parallel-processing-capable? I.e. if I have two parallel chains of plugins that feed into the same output port, can they be processed in parallel threads (and by two separate cores)?

Second question: do I understand your description correctly that if I use the same volume plugin to control output volume in multiple racks, I need to turn on “aggressive multitasking” to be able to have these racks processed in parallel?

Then at least I could re-structure my processor-intensive songs with multiple separated chains to a single output.

@brad: great if you could help clear this up!

Cheers,

Torsten

terrybritton · April 8, 2017, 1:09am

Well, it seems like your reasoning is perfectly sound - make comprehensive instrument racks that contain the output effects on a per-VSTi basis, rather than porting all your VSTi’s to a common effects rack they all share, if I’m following you. That way each VSTi/FX system could live in its own core, if I’m following you.

Brad should be along to answer us soon - I believe he is working on adding a quantum computing element to Cantabile this week.

Terry

P.S. - if Brad is not working on that quantum computing feature, consider this a feature request!

brad · April 8, 2017, 1:38am

Hey All,

Sorry for the slow reply while I’m travelling.

To sum up Cantabile’s execution model:

Everything is grouped into “Execution Groups”. A group is a set of things that are always processed together. eg: It might be all the input processing. or for a plugin it will be the plugin’s input mixers, the plugin itself and its output mixers. (ie: everything related to processing that one plugin.
Execution Groups have “precedents”. A precedent is anything that must be processed before this execution group can be processed. eg: all plugins have the input group as a precedent. A plugin that’s processing the output of another plugin will have that other plugin as a precedent.
The audio engine maintains a list of execution groups that currently have no pending precedents - ie: “ready execution groups”. These are dispatched to worker threads so if two or more groups are ready at the same time they’ll be processed in parallel.
When an execution group has finished processing it’s flagged as completed. When all of a group’s precedents are finished that group becomes ready and will be scheduled for execution as soon as a spare thread is available.

So suppose you have two separate chains of 3 plugins and then each chain feeds into a single plugin.

P1 -> P2 -> P3 --+
                 +--> P7
P4 -> P5 -> P6 --+

P1, P2 and P3 will processed sequentially one after the other. Same for 4,5 and 6. Note though that both chains will be processed in parallel. P7 will execute once both P3 and P6 have finished.

Make sense?

Brad

David · April 8, 2017, 5:11am

Much good information here. That’s a very clear explanation in your last post Brad.
I’m in the process (pun half intended) of upgrading my main studio rig over the next few weeks as funds become available from a Q9450 to an i7 7700K so this discussion on hyperthreading is particularly useful.

papwalker · April 8, 2017, 6:28am

The average Fred has really no clue how much Mathematics are in music.
I view music as a form of pure and applied math.
It is really strange that it exists at all and we appreciate and react to it on such an instinctual level.
Implementing via a digital machine is a whole new discipline.
( … and a tidy little earner )

papwalker · April 8, 2017, 6:33am

I’m working on a Schrodinger’s Cat VST that can do everything!

Neil_Durant · April 8, 2017, 11:58am

Surely those would be different instances of the plugin, and thus entirely independent in terms of processing?

Neil

Torsten · April 8, 2017, 2:51pm

That’s not quite what @brad’s guide to multi-core says:

Multi-Core Mode:

Compatibility Mode provides significant performance increases when running most multi-rack songs and is the recommended mode for most situations. In this mode racks are processed in parallel but processing will stall if two or more plugins of the same type need to be processed at the same time – in which case they will be processed one after the other.

Aggressive Mode is suitable when running many racks with the same plugins on each rack. In this mode the plugins being used must be compatible. Many plugins are compatible with this mode, but those that aren’t can cause undesirable effects ranging from noise to crashing the entire application.

When a song contains no duplicate plugins Compatibility Mode and Aggressive Mode are effectively equivalent.

I interpret this as: if I use the same volume/pan control plugin in all my racks, I need to use Aggressive Mode to make sure these racks can be processed in parallel. @brad: correct?

Cheers,

Torsten

Neil_Durant · April 8, 2017, 4:05pm

Ah yes, reading that, I think you’re right. Perhaps it’s time to consider using Cantabile faders instead of a volume plugin. They work smoothly with bindings, with a nice response curve. I use bindings to a Cantabile fader in all my instrument racks.

On the other hand, a simple volume plugin is unlikely to occupy much CPU time, so perhaps the chances of thread contention are low, even if you have many instances.

Neil