How does pitch adjustment work?

Some friends were discussing voice correcting technology. We’re not really sure how it works. Anybody know? One guess is that it changes the pitch to the closes note; someone else suggested that it needed a song to match the singer’s note, and timing, to(?). Anybody know? Which brings up the point …

How does MIDI work? If I’m playing an A (440 cps) followed by an A (880 Hz), does that mean that the latter will scan the sound sample twice as fast? If so, then a higher note could not be held as long as a lower note. Any insight is welcomed.

I played around with Antares Auto-Tune at a friend’s studio. You have to set the key you’re singing in. Then it corrects to the nearest note in that key. We had a laugh setting it to extreme correction and pretending we were pop singers. :grinning:

Back in the 90s, one of my friends at the time worked for Digidesign and contributed to the development of what is now called auto tune. He worked from home a lot, and he invited me one time to check out what he was working on. He used his soprano sax to test it. The way he explained it to me at the time is that it adjusted the actual wave form of what you play.

  • Paul

Interesting. Adjusting the wave form would require an adjustment in time, not amplitude. That’s why I’m curious about how it works. Setting the key is interesting too in that it would avoid(?) “sour” notes?

A quick response to the MIDI question. MIDI are the instructions not the sound. “Play note number x” A-440 in MIDI would be “Note On #64” A-880 would be “Note On #81” Notes are numbered from A0 (27.5Hz) to G#9 (13289.75Hz) as note number 21 through 127.
The actual sound created when a given MIDI note on message is received depends on the synth, A0 through G#9 can be mapped to any sound not just musical notes.

Hey Ron,

to - maybe clear up some confusion:

Voice correction: tools like Antares Autotune (software) or TC Helicon VoiceLive (hardware) are able to change the pitch (i.e. frequency) of the incoming audio signal to a specific target. The typical use case is to “straighten” vocals, when the singer is slightly off the intended note - ideally in a not-so-prominent way, but it can also be used for creative effect (“Cher effect”, Kanye West type sounds). You can adjust how strictly and how quickly this tool enforces the target pitch; lower values of strictness still sound natural, higher values result in notes being brutally flattened to the target pitch, resulting in the “Cher effect”.

Now how does this tool know what note to adjust towards? Two ways:

  • MIDI input, i.e. you feed MIDI notes to the tool and it will change the pitch of its input to the notes you play
  • scales: you define a scale, and it will adjust the pitch of input notes to the CLOSEST note on that scale.

The simplest and easiest form of scale-based adjustment is semitone adjustment: the input is simply adjusted to the nearest semitone. The advantage of it: it will work for songs in any key, as long as the singer is not too far off the target note.

In a number of tools, you can define other target scales beyond semitones, so that “sour notes” can be avoided

There is also a variant of pitch adjustment: harmonizers. These use essentially the same pitch adjustment technology in multiple instances, so that they generate e.g. three target pitches from one input note. These target pitches are calculated either from scales or from chord harmonies (fed via MIDI or audio chord recognition), so that you can create harmony voices that fit with the song you are playing.

I assume you are not asking how MIDI works in general, but in connection with pitch adjustment - correct?

Most pitch adjustment tools need to work in real time, so they don’t adjust a sung “note” as a whole (i.e. take the whole note as a sample and simply play it back faster), but they divide it up into very small snippets that get transposed real-time. Therefore if you adjust a note by an octave, you can still hold that note for the same time, because the algorithm will just keep generating and transposing micro-snippets of that note. The result will probably not be very realistic, but it will definitely be one octave higher and of the same length. The exact workings of those algorithms is of course the “secret sauce” every plugin / hardware developer keep to themselves.

Hope this answers some of your questions.



Thanks, all, for the input. I have spent a lot of time researching this online. What I’ve read and heard distills down to yada, yada, yada; the “secret sauce” is still secret. What I was expecting was possibly one of two methods.

The first method I envisioned was splitting a wave into minuscule tall thin rectangles, and then keeping the trailing value a bit longer to lower a pitch, or chopping a bit off the training value to raise the pitch (I may have those backwards). But this way the time span would be compressed or expanded – so that could not be it.

The second method I envisioned was altering the analog-to-digital value of the minuscule tall thin rectangles by adding or subtracting some value in order to make the resultant digital-to-analog sound higher or lower. This would change the note – not the volume – without compressing or expanding the time span.

I just can’t find that level of explanation online. I wonder if VST works that way. (Maybe I should not have used MIDI in a previous posting).

Pitch correction is done in the digital domain therefore there is no time stretching or compression involved. Ones and zeros (bits) can be manipulated in any way, shape, or form.
As a VST, the audio is already in digital format. Live devices like TC Helicon or Autotune send the analog audio through a DAC, manipulate the bits, and send it through the ADC back to analog output. There will be a bit of latency, usually not noticeable.

Thank you, John Toth, that’s an explanation I can accept.

Weeeeell, maybe not. It is easy to confuse amplitude (volume, loudness) with pitch (frequency, speed).

So bring out the old stopwatch. Cantabile has a window that shows a cursor traveling from left-to-right across the sound wave it is playing. The cursor moves slower for lower frequency notes, and faster for higher frequency notes.

Changing a digital value would change its volume, not its speed. How do you change a frequency digitally?

it’s not as easy as you may think - of course you can play the samples back faster or slower, pitching playback up or down (think speeding up or slowing down a “digital tape”). But if you want to either preserve speed while changing pitch or preserve pitch while changing speed, you need sophisticated algorithms, most of which are (understandably prorietary). Typical algorithms of this type are elastique or MPEX…

Of course, all of this happens in the digital domain (that’s why we are talking algorithms)…

If you want to understand the science behind it, here’s an introductory article:

Unfortunately, it’s not quite as easy as the ideas you guys discussed above - significant digital signal processing happening…

I’m out of this discussion now…



In the digital domain you can alter amplitude, frequency, or time. Not just volume.
I have never seen Cantabile speed up or slow down the transport based on frequency. But, I have never watched the timeline with a stopwatch. It could be the “display” is what is changing the perceived speed. The transport has to keep accurate steady timing, real or musical. BPM does not change.

Well, George Martin and Geoff Emerick would like to differ. Strawberry Fields Forever is famously known to be a merge of two takes, in Bb and C.

They did not have digital time copy/paste and time stretching but I guess once Lennon said “You can fix it, George”, they had no other choice than succeeding with their tapes, varispeed and a pair of scissors.

We’re not talking about “pitch correction” from 50+ years ago. The question was about modern pitch correction, as in AutoTune and TC Helicon. Which IS entirely in the digital domain.
And George Martin was not correcting a few bad vocal notes.

For 34 weeks, I KJed (hosted as a Karaoke Jockey) at my apartment complex (until one of the guests “what” got rough resulted in our getting shut down). The system was Karafun – great! – and had a feature where the key could be made higher or lower, and / or the tempo / speed could be made faster or slower. In changing the pitch, the background singers could sound like chipmunks or drunks, but changing the pitch alone did not affect the time. I sure wish I knew how that is done.

I can understand a tape being stretched or shrunk. I can understand the same tape being played faster or slower. But I can’t understand how a pitch can be changed without affecting the time. BUT it exists!

And has existed for some time.

GETTING WARMER … ¿fsk frequency shift keying? 2:22

Excellent video!

Ron, I’ve been trying to tell you. In the analog domain frequency and time are inseparable. This is NOT the case when manipulating zeros and ones with mathematical algorithms. The digital domain is COMPLETELY different there is NO analog continuum.

@Jtoth, I am not contradicting you. It is obvious that the case exists where the pitch is altered within a given time frame, without changing the time frame. What I’m wanting to know is how it is done.

My latest guess is that the thinner FSK square waves are propagated to make a higher pitch (overlaying a portion of the thicker wave that follows), and some are eliminated to make a lower pitch (the thick wave overlaying the thinner waves that follow). But that is my guess. I’ll find the true answer somewhere on the internet. There are no secrets anymore.

BTW, there is a timer chip – 555 – that converts an analog wave into an FSK wave. My first encounter with this was as a Dual Inline Package (DIP). Volume is unaffected.