Speech signal generation and form
Author: Ph.D. Marcin Just (DiagNova Technologies)Speech signal generation process
There would be no speech acoustic analysis without… speech. It is the basis and all analysis mechanisms are adapted to its specificity. In order to be able to fully understand the principles governing analysis, the first thing to understand is the process of voice generation. It is based essentially on one physical law - Bernoulli's law. According to it, the faster a gas moves, the lower the pressure in it (simplified). This law is "responsible" for the lift acting on the wings of airplanes (Fig. 1) and for the force closing the vocal folds.
Fig. 1. Aerodynamic (lift) force acting on the wing (drawn in section). The air flowing over the top of the wing (route "B") has to travel a longer distance than the one flowing below it (route "A"), so according to Bernoulli's law there is a lower pressure at the top that "sucks in" the wing
From the outside, the folds are subjected to a force from the atmospheric pressure, and from the inside, the force from the pressure of the air flowing out of the lungs. Since the air passing between the folds is in motion, the pressure in it is lower, and this causes the folds to tighten spontaneously. After they are closed, the air stops moving, the pressures balance and the folds return to their original position. This movement will repeat cyclically with a frequency depending on the mass of the folds and their elasticity
Thus, by varying the force that tenses the folds, human can regulate the frequency of closing and opening the air path. These changes in its flow are the "engine" that generates the voice and finally the speech signal. The final effect is influenced by other factors, as shown in Fig. 2.
Fig. 2. Schematic representation of the speech signal generation process.
The cyclically intermittent stream of air adds up with various types of noise, passes through the resonance chambers (analogous to reverberation in a cave or a violin resonance box) and can eventually be recorded by a microphone.
Voice as periodic sound wave
The vast majority of methods of analyzing speech are related to its periodic character. Due to the fact that the air stream is interrupted by the vocal folds, the final signal coming from the mouth and recorded by the microphone is also cyclically repeated (periodic). The microphone converts the instantaneous pressure value into a voltage level that can be easily recorded. In the age of computer technology development, the easiest way is to save the speech signal digitally. The voltage value is converted tens of thousands of times per second (usually 22050 times per second) into a numerical value that the computer saves in memory. It is easy to make a graph from such successive "samples" (Fig. 3). This graph is called an oscillogram.
Fig. 3. Speech signal oscillogram: a 1-second fragment at the top, a 20-millisecond part at the bottom, which shows a periodic character
Fundamental frequency
The length of time between consecutive closure of the vocal folds defines the smallest repeating sequence in a speech signal. It is called the base period (T0), and its relation to the oscillogram is shown in Fig. 4.
Fig. 4. Finding the base period on an oscillogram. The characteristic downward peaks in the graph define the moments just before the vocal folds close
However, the baseline period is not the most used term in the field of speech analysis. Usually, its inverse is used - the fundamental frequency:
F0 = 1/T0.
Due to the fact that the speech signal usually consists of a large number of basic periods (several hundred), within the entire analyzed signal, the fundamental frequency defined by the above formula can be determined many times - period by period. The changing value of the fundamental frequency can therefore be represented as a graph. This important graph of the fundamental frequency is called pitch, and is exemplified in Figure 5.
Fig. 5. Fundamental frequency chart
The fundamental frequency can be determined manually - based on the graph - but the correct execution for a large number of fundamental periods is extremely tedious work. Usually, an automatic algorithm is used for this, which is calculated by the computer.