Spectrograms
Author: Ph.D. Marcin Just (DiagNova Technologies)The analysis of acoustic waveforms based on spectrograms is a type of long-term analysis. It is usually used for recorded words and sentences, although it is also possible to apply it to single sounds. Unlike parametric analyzes, it is much more subjective and requires more experience. Correctly carried out, it can provide a large amount of useful information. In the case of the spectrogram, there is no possibility of making a "fat" error during its determination, so it is a very reliable diagnostic tool. Spectrograms are also often the only diagnostic tool in the case of very large pathologies, surgical procedures and in the case of "prosthetic" voices.
Spectrogram interpretation
In the case of spectrograms, narrowband and broadband spectrograms are interpreted completely differently. The only common assumption is that darker color (in the case of grayscale spectrograms) means more energy per frequency at any given time.
The narrowband spectrograms interpretation
An exemplary spectrogram is shown in Fig. 2.
Fig. 2. Narrowband spectrogram
The basic structure carrying information here are horizontally running lines denoting consecutive (counted from the bottom) harmonics. The lowest line represents the F0 waveform. The number of lines visible in the spectrogram determines the frequency up to which the harmonic structure is visible. Additionally, the formant structure can also be observed (additionally marked in Fig. 2a).
Fig. 2a. Narrowband spectrogram with marked formant structure
Noise (higher frequencies) and distortions (lower frequencies) appear between the dark lines representing harmonics. The degree of contrast is therefore a measure of the disturbance of a speech signal.
When interpreting spectrograms, attention should be paid to the frequently appearing mains hum (specific disturbance related to the field generated by the lighting network). It appears in spectrograms as a horizontal band at a frequency of 50 Hz (much less often also 100 Hz), extending also to periods of time without phonation. An example of such disturbances is shown in Fig. 2b. Particular attention should be paid not to treat this type of disturbance as a pathologically lowered value of the fundamental frequency.
Fig. 2b. Hum on narrowband spectrogram
Broadband spectrograms interpretation
An example of a broadband spectrogram is shown in Fig. 3.
Fig. 3. Broadband spectrogram
Due to the wide bandwidth of such a spectrogram (240 Hz), no horizontal lines related to harmonic frequencies can be observed.
The basic structure carrying information here are thick horizontally running lines marking the course of the formants (Figure 3a).Additionally, the visible (but not always!) Vertical lines allow the determination of the moments of closing the vocal folds (just after closing, the harmonic structure is the richest, and the amplitude of all frequency components is the largest).
Fig. 3a. Broadband spectrogram with the formant structure marked
The more expressive the vertical lines are, the more energy is generated when the folds are closed, and thus their work is "more effective". Any obstacles preventing the correct closure of the folds will worsen the contrast on the broadband spectrogram.
Fig. 4. Stretched broadband spectrogram with clearly visible vertical lines and the distribution of the signal energy within individual fundamental periods