Synchronization of MIDI and Audio when Recording

Can the MIDI and Audio tracks produced during a Cantabile recording be synchronized with each other?

I (foolishly?) assumed that all tracks - MIDI + Audio - were time-synched to the sample. I’ve been relying on this in my latest round of latency testing with Cantabile. I now have an example which does not look possible:

I am measuring the latency of the Respiro VST. At the top is the stereo waveform that was triggered by the MIDI in the bottom track. Impossibly, the audio initiates prior to the start of the MIDI note by over 30 samples. This does not seem possible.

I am hoping someone can shed light on this … is it possible to get them synchronized? (i.e. so that the MIDI and Audio recordings begin at the same point). I have been doing a lot of testing that relies on this, and now that work is in jeopardy …

Well … this Gearspace discussion (over a 10-year period) …

… implies it ain’t happnin’. MIDI seems to have inherent latency, slop, jitter, serialization issues, and on and on …

Any info or thoughts on the subject of measuring latency between MIDI and Audio events (like “how long does it take this VST to respond to MIDI events”) … would be appreciated …

One approach to comparing MIDI with subsequent audio that is triggered by the MIDI is to convert the MIDI events to audio with a custom-wired MIDI cable. The multi-tracked audio can then be directly compared with no issues of synchronization between MIDI and Audio files. The wiring is easy (one resistor at the simplest - see below).

Two resources discuss this:

In addition, I’ve begun using an excellent little stand-alone PC app for testing the latencies within the audio interface: Oblique Audio’s RTL Utility. RTL = Round Trip Latency. You patch an output on your interface back to an input, run the RTL Utility, and it disgorges tons of interesting info about latency on that round-trip path.

By the way, here is the MIDI + Audio cable From the Sound On Sound articles:


If you’re wanting to measure only the delay between a plugin receiving an actionable midi msg and it outputting sound (what your initial picture seems to indicate), none of the transmission related latency mentioned in the articles matters for that measurement. Is that what you’re wanting to measure?

The crux of the problem for me that, when Cantabile records a MIDI and a WAV file simultaneously, the two are not aligned closely enough for accurate measurements. All the audio tracks in the WAV are lined up with each other, and all the MIDI tracks seem to be lined up with each other, but there does not seem be alignment between MIDI and Audio.

The approach outlined in the articles gives a way to convert MIDI to audio (basically a full-scale negative deflection - an Audio BLIP) using an modified MIDI cable with an unbalanced wire-pair that I can plug into my audio interface (like by a 1/4" TS connector) and record alongside the other audio.

I’m still not clear exactly what it is you are wanting to measure. If its the plugin itself the transmission latency that the articles describe isn’t relevant. If its the transmission you want to measure, then yes, the cable is going to help, but it doesn’t capture whatever latency the keyboard has between keypress and actual midi signal outbound. So I get it that there is a mis-alignment in Cantabile (and not sure why that is), but other than that issue what is the root problem you’re wanting to solve for? If its the keypress to actual sound latency that you really want to measure, its difficult to do but possible. However, before going down that rabbit hole its always good to do a reality check with the one foot rule, meaning sound travels about 1 ft per millisecond. So if the measured latencies of concern are in the 3-4 ms range, I’d argue they aren’t relevant. If in the 30-40 ms range definitely relevant, but the offset in the midi and audio is mathematically insignificant in that case. JMO of course.

Also its worth noting the offset in the example looks like about 30 samples or so. Assuming 44.1KHz sample rate, that equates to about 0.68ms. So depending on the accuracy needs you have, perhaps measure then say ±.7ms?

Thanks for the input @sekim … it really does help to have another perspective!

My overall situation is a rather complex rig with a lot of configuration possibilities. Currently overall latency is too high for comfort (close to 50ms typical). I have a lot of things I can tweak - breath reaction times on my wind synth (a bunch of settings there), SPDIF vs audio out of my hardware sound modules, serial vs USB MIDI, etc. Also general routing issues like do I run my looper in-line with my signal vs running it as a sidechain, and do I bring my sound module sounds back in to Cantabile for FX or daisy chain a hardware FX.

The latencies when looking at the audio traces all lined up are really easy to see. However I really do need to know when the MIDI appears and the MIDI visually moves all over the place. Don’t quite know why - Maybe Cantabile, maybe Jitter - but I gotta nail the MIDI event to be able to tell how to configure things to lower the latency in each segment of my rig and to make the other choices in a reasonably informed way …

I’ve posted my early work in the Latency Testing thread.

I’ll say that when I just plug my headphone into the wind synth directly - the sound is not great, but the immediacy is wonderful … as Matt Traum commented, “It’s like the sound is coming directly out of your mouth” (he was actually referring to a wind synth linked to an Analog modular synth controlled by CV).

Thanks again!

If the system latency is ~50ms I can definitely see why you want to lower it. That said, the .68ms offset in the example is mathematically insignificant. Doesn’t mean it should be there, just means that the tool you’re using to make the measurement should suffice even with the issue. Again, .68ms equates to listening from about 8 inches away from the sound source. Whereas 50ms equates to about 50 feet away from the sound source. Anyway, hope you get there! 50ms is a bit much imo.

[edit] After reading your other thread it all makes more sense now why the .68ms or so is relevant to you…