Artificial Speech

Steve Whiteley wrote:

How would I design a patch that had a throaty bull frog croak envelope while voicing my sons name "MARC"? I have played with "Clavia" and frog patches trying to isolate the necessary parts with some success. I am interested in the process, the thinking of how you break down the task, not just a patch.

Kees van der Maarel wrote:

I tried to make a patch which does just that. Since the update from OS version 1.1 to 2.0, which gave us the vocal filter, I have made several artificial speech patches. Synthesizing the vowels are dead easy because of the vocal filter. How to generate the consonants is more difficult. Consonants like S, T, V, etc., consist of filtered noise, together with a suitable envelope curve. Consonants like M, N, L, are formed by filtering the sound of the vocal cords in a certain way. When somebody for instance speaks the letter "N", the nasal cavity is "switched" inbetween. This cavity acts as a sort of filter, just as any part of the vocal tract does.

In the attached example you see two crossfader modules, called "M-fade" and "R-fade". On a certain moment, determined by the "brains" of the patch, the Event sequencers, these crossfaders switch another filter into the signal chain. The values of these filters were determined by either listening carefully to what happens when you pronounce these consonants, or record a real voice and look at it with a spectrum analysis program. The "K"-sound in this patch is rather complex. It consists of a pulse, which is filtered by the K-ClickFilter, together with a burst of quantized noise. There is also a control sequencer which controls the intonation of the spoken word. With knob 7 you can change this intonation in "questioning" or "affirmation".

All my artificial speech patches work this way: one or more event sequencers, running in parallel, to determine the timing of the syllables, a control sequencer for selecting the right vowel and one for controlling the intonation. When you got the hang of it, it's quite easy, actually. Instead of the sawtooth wave from the "Vocal Chords"-oscillator, you can

use a noise source, which gives a whispering effect (turn knob 5), or a sound wich comes from the audio inputs of the Modular. Just some ideas...

See also the last part of Formant Frequency