Selected methods of analysis of periodic signals
Author: Ph.D. Marcin Just (DiagNova Technologies)Periodic signals can be analyzed in many ways, but due to their nature, two methods turn out to be particularly useful - Fourier analysis and linear prediction. For a speech signal, parameterization is often used, which is essentially an extension of both methods.
Fourier analysis
It is based on a simple idea - instead of examining a complicated signal, it should be presented as a sum of simpler signals, the behavior of which is easier to predict and study. Such analysis is used in many fields of science - wherever the tested system behaves the same for simple component signals and for their sum (the system must be linear, i.e. when increasing the input signal, the output must be proportionally increased). In acoustics, this assumption is correct in many cases. It is important to choose the so-called bases, i.e. a set of simple (easy to analyze) signals that can add up to each signal found in the system. In the case of periodic signals, such a very interesting set is the set of sinusoids (trigonometric functions of the form y = sin (ax + b) , where the parameter a determines the frequency of repeating constant fragments - Fig. 6), with more and more frequent oscillations.
Fig. 6. Sinusoid: a) y = sin(x + 0); b) y = sin(3x + π/4)
They are a good choice as they are periodical themselves. Neither of them can be represented as a sum of other sinusoids, which ensures the uniqueness of the signal representation (there is only one combination of sinusoids giving the total of the tested signal).
The principle of Fourier analysis is simple - the smallest repeating element is selected from the analyzed waveform and presented as a sum of sinusoids, as shown in Fig. 7.
Fig. 7. Principle of Fourier analysis
The magnitude of the contribution of individual sinusoids (for specific oscillation frequencies) is usually visualized as the height of the bar in a bar graph (Fig. 8). Such a plot is called a signal spectrum.
Fig. 8. Fourier spectrum
The characteristic "peaks" on the spectrum diagram for the speech signal correspond to the frequency gain in the resonance chambers, the so-called formants. This is actually the only useful piece of information that can be read from such a chart. Its usefulness is therefore limited (especially since there are better methods for reading the controls). It is particularly important that a single spectrum does not, by definition, carry any information about the differences between successive base periods in the speech signal (and the examiner is particularly interested in this information in speech analysis). However, it is enough to introduce a slight modification - not one basic period is analyzed, but rather a conglomerate of several such periods - to significantly improve the usefulness of the Fourier analysis. Then the spectrum becomes fundamentally different. In an ideal case, when the analyzed periods are identical and there is no noise, the spectrum is then in the form of single isolated peaks (Fig. 10a). The peaks are separated by "empty" areas of longer length; the larger set of periods is analyzed.
Due to the problems with the exact selection of groups of several periods, a simpler method is used - the length of the analyzed fragment is constant, equal to e.g. 0.025 s and does not constitute a full multiple of the period length (e.g. it covers about 6.3 periods, which causes some negligible errors). This is a typical example of conventional Fourier analysis used e.g. in spectrograms. The spectrum obtained for a signal of this length is quite characteristic - it contains maxima (peaks) for a multiple of the fundamental frequency (Fig. 9) - the so-called harmonic frequencies.
Fig. 9. Classic Fourier spectrum
It is important that no matter if the consecutive periods are different or identical, whether there are noises or not, the bars between the peaks do not have zero height if the analyzed period does not exactly match the length of the specified number of periods. Of course, the height of the "peaks" reflects the contribution of successive multiples of the fundamental frequency (that should only be included in the speech signal), and the height of the bars between the peaks is related to the presence of noise, but overlaps with the phenomenon related to the uncorrelated length of the interval and the multiples of the basic period. It slightly disturbs and hinders the analysis process. A much "cleaner" image of the spectrum is obtained when the analyzed section covers a strictly defined number of periods.
Fig. 10. Spectra for the analyzed segment correlated with length of the period: a) ideal case; b) average case
The amplitude of any noise and distortion can now be accurately determined in relation to the amplitude of the harmonic components by measuring the ratio of harmonic peaks to non-harmonic peaks (Fig.10b). While the peaks associated with distortions (differences in the length and shape of the periods) predominate in the lower part of the spectrum, the upper part of the spectrum (above 4000 Hz) carries information about noise.
Spectra are determined for relatively short periods of time, so it can be done several times for the entire recorded sample. By placing successively calculated spectral diagrams next to each other and converting the height of the bars to the degree of darkness, an exceptionally useful diagram is obtained, called a spectrogram. The method of its generation is shown in Fig. 11.
Fig. 11. Generating a spectrogram from a series of spectra
Narrow and broadband spectrograms
The greater the length of the signal sample when creating the spectrum, the better we will obtain its resolution in frequency (better separation of harmonic peaks). The spectrogram obtained from such spectra will also have an excellent resolution in the frequency domain, however, the large time interval between consecutive spectra will make its temporal resolution small. Conversely, by using very short samples to create the spectrum, we will obtain excellent temporal resolution, and worse frequency resolution. Unfortunately, these two requirements cannot be reconciled. Hence, there are two types of spectrograms:
- narrowband (20Hz band, good frequency resolution – Fig. 12a);
- broadband (240 Hz bandwidth, good time domain resolution – Fig. 12b).
Fig. 12. Two types of spectrograms: a) narrowband; b) broadband
The names of spectrogram types come from the times when they were created by electronic analyzers characterized by a specific band (frequency range). Bandwidth can simply be interpreted as the resolution of a spectrogram in the frequency domain.
Fourier analysis and the fundamental frequency
In the above-mentioned case, when a fragment of the signal longer than the base period is taken for the Fourier analysis, the spectrum will show peaks related to successive multiples of the fundamental frequency. Of course, the first one corresponds to the fundamental frequency itself (Fig. 13). In the narrowband spectrogram, successive harmonics (multiples of F0) appear as horizontal lines, the lowest of which determines the course of the fundamental frequency (Fig. 14).
Fig. 13. Fundamental frequency in the Fourier spectrum
Fig. 14. The fundamental frequency on a narrowband spectrogram
Linear prediction
It is based, like the Fourier analysis, also on a relatively simple idea. If a speech signal is created from a signal directly generated by the vocal folds (in Poland this signal is referred to as "voice", in foreign literature - as a primary signal or excitation) subjected to multiple reflections in resonant cavities, it can be presented as a sum of several differently delayed excitation signals. Hence, it is only a step to try to express the speech signal at a given moment as a sum of samples of this signal in the previous moments.
Due to its close relationship with the operation of the resonance path, linear prediction is particularly suitable for determining its resonance frequencies (formants).
After the formants are determined, their influence on the speech signal (the original signal generated by the folds) can be eliminated, i.e. it can be recreated. This process is called reverse filtration.
Formants
Formants (vocal resonant frequencies) are the basic feature that differentiates individual sounds. Formants for words and sentences change with time according to the change of sounds, and their course, as shown in Fig. 15, can be traced on spectrograms (broadband spectrograms are commonly recommended for this purpose, but narrowband spectrograms are equally effective).Linear prediction allows you to automatically determine the formants more precise than the spectrograms.
Fig. 15. The formants waveform visible on the spectrogram using the linear prediction method
Excitation signal (speech)
The function of the vocal folds is known relatively well (much better than the function of the basal membrane in the ear). Air flow changes related to the cyclic operation of folds are described by some simplified models (consisting of simple curves - function graphs). One of the most popular models is the LF. Without going into the formulas defining the fragments of the diagram defining the course of the air flow changes, it can be simplified to some extent (flow) as it was done in the diagram in Fig. 16. The lower graph showing the change in air flow velocity is especially important. It obtains the fastest opening of the vocal folds at the closest limit, and the minimum at the moment of their fastest closing - just before closing (analogous to a door slammed by drafts - the closest one since being activated).
Fig. 16.
Using the markings as in the graphs in Fig. 16, one can define the opening factor (the ratio of the fold opening time to the length of the base period):
Qo = Tc/T0,
and the closing rate (ratio of the closing time to the length of the period):
Qz = (T0 – Tc)/T0,
The relationship between the signal recorded by the microphone and the excitation signal and the graphs in Fig. 16 is shown in Fig. 17.
Fig. 17. The signal from the microphone and the excitation signal reconstructed from it using the inverse filtering method
Parametric acoustic analysis
Fourier analysis and linear prediction, despite being extremely useful, usually lead to certain graphs. Their analysis is always somewhat subjective. To ensure objective diagnostics, an automatic analysis of the speech signal is performed, resulting in a set of parameters. These parameters are often calculated using the results of both analyzes.
Basic parameters
The most commonly used parameters are:
- F0dev – standard deviation of the fundamental frequency, measuring the long-term frequency stability;
- jitter - measuring irregularities in the length of basic periods, i.e. short-term (period to period) changes in F0;
- shimmer - measuring irregularities in the amplitude of a signal period-to-period;
- NHR - specifying the content of non-harmonic components in the range of higher frequencies in relation to the harmonic components of lower frequency.
From a mathematical point of view, the above parameters are calculated very simply (jitter - the sum of the relative differences in the length of adjacent periods for all successive pairs in the entire analyzed waveform, shimmer - analogically, only for amplitudes, NHR - the ratio of non-harmonic components from Fig. 10b for frequencies greater than approx. 1200 Hz to harmonics for lower frequencies). However, they use previously determined, in a much more complicated way, values of fundamental frequencies (fundamental periods) or spectral analyzes.
Additional parameters
The basic parameters are universal, and therefore quite general. They are sensitive to most possible voice disorders and do not allow for more precise differentiation. Therefore, sets of supplementary parameters are introduced - more precisely "tuned" to individual disorders. The most important parameters are presented in Table 1.
Table 1. Selected additional parameters
| Parameter | Description |
|---|---|
| HPQ (harmonic perturbation quotient) | Parameter specifying the constancy of the shape of the basic periods. By design, insensitive to differences in the length of the base periods |
| HPQh | As HPQ, but only for components above 1200 Hz |
| RHPQ (residual harmonic perturbation quotient) | Similar to HPQ, but the analysis is performed for the excitation signal restored from the microphone signal |
| RHPQh | As RHPQ, but only for components above 1200 Hz |
| R2H (residual to harmonic) | Parameter that determines the dynamics of vocal folds closure - sensitive to small organic changes |
| U2H (unharmonic to harmonic) | Parameter determining the ratio of the non-harmonic part to the harmonic - it defines both the level of disturbances and distortions |
| U2Hl | Similar to U2H, but for the lower (up to 4000 Hz) part of the spectrum - it defines the level of speech signal distortion |
| U2Hh | Similar to U2H, but for the upper (above 4000 Hz) part of the spectrum - it rather defines the interference level |
| S2H (subharmonic to harmonic) | Parameter specifying the ratio of the amplitude of subharmonics to harmonics for the lower (up to 4000 Hz) part of the spectrum - the level of distortions related to the different work of both folds |
| Q | A parameter that determines the frequency above which the harmonics do not significantly dominate the noise and distortion |
| Yg | Automatically determined Yanagihara coefficient |
Parameter grouping
Most of the parameters are created in several varieties. Knowing about their interdependence, it is easy to put them together into groups.
The following parameter groups can be distinguished:
- F0;
- jitter and derivatives (RAP, PPQ);
- shimmer and derivatives (APQ);
- HPQ and derivatives (HPQh, RHPQ, RHPQh);
- R2H;
- U2H and derivatives (U2Hl, U2Hh);
- S2H (somewhat similar to U2H);
- NHR;
- YG, Q;
- voice field, F0 standard dev., amplitude standard dev.