Parametric analysis
Author: Ph.D. Marcin Just (DiagNova Technologies)Parametric analysis is the most objective form of speech signal analysis. Its main disadvantage is the sensitivity of most parameters to errors in determining the fundamental frequency. Fortunately, errors tend to increase the value of parameters very much when the value is low and relatively slightly decrease when the value is high. This does not lead to the worst type of errors in qualifying patients - classifying the patient as healthy.
Short-term parameters determined in the diagnostic analysis module of the DiagnoScope Specialist program
| Parameter | Description |
|---|---|
| KeyF0 | The frequency around which the fundamental frequency is being searched for. |
| F0 | The fundamental frequency averaged over the entire sample length. |
| F1, F2, F3, F4 | Frequencies of formants. |
| E | Energy of one baseline period, averaged over the length of the entire sample. The parameter has a practical value only after applying the calibration of the recording track. |
| A0 | Amplitude of the component corresponding to the fundamental frequency, averaged over the length of the entire sample. The parameter has a practical value only after applying the recording track calibration. |
| Voiced | A measure of the probability of phonation. |
| SimpleQ | A simplified measure of voice quality (a measure of harmonic structure disturbance). |
| Q | A parameter that determines the limit frequency, above which the non-harmonic components (mainly noise) become comparable with the harmonics. |
| Jitt (Jitter) | Calculation in the classical way as the relative difference in the length of adjacent basic averaging periods over the length of the entire sample. |
| RAP | Related measures of irregularity in the length of base periods (derivatives of the Jitter parameter). |
| PPQ | Related measures of irregularity in the length of base periods (derivatives of the Jitter parameter). |
| Shimm (Shimmer) | Calculated in a classical way, as the relative difference in amplitude of adjacent basic periods averaged over the length of the entire sample. |
| APQ | A related measure of the irregular amplitude of the basal periods (derivative of the Shimmer parameter). |
| HPQ (harmonic perturbation quotient) | Parameter defining the constancy of shape of basic periods. By design, insensitive to differences in the length of the base periods. Parameter specifying the spread of Fourier coefficients of the spectrum obtained for single basic periods T0 in the range up to 8000 Hz. |
| HPQh | Like HPQ, but only for components above 2400 Hz. |
| RHPQ (residual harmonic perturbation quotient) | Like HPQ, but analysis performed for the excitation signal reproduced from the microphone signal. |
| RHPQh | Like RHPQ, but only for components above 2400 Hz (up to 8000 Hz). |
| R2H (residual to harmonic) | Parameter defining the dynamics of vocal folds closure - sensitive to small organic changes. Parameter defining the ratio of the Fourier coefficients of T0 periods from the wave-form of the microphone signal to the same coefficients for the reconstructed original signal. |
| U2H (unharmonic to harmonic) | The parameter defining the ratio of the amplitudes of the non-harmonic part of the spectrum generated for the 4 fundamental periods to the harmonic part - it defines both the level of disturbances and distortions. The most general parameter, calculated for all spectrum components below 1800 Hz. With a given harmonic component, non-harmonic components in its immediate vicinity are always compared, which allows to become independent of the influence of various characteristics of used microphone. |
| U2Hl | Like U2H, but calculated for the environment of the 4 lowest harmonics (i.e. up to the frequency of 4F0, where F0 is averaged for groups of 4 consecutive fundamental periods). More sensitive to waveform distortions (unevenness, asymmetries in the work of the vocal folds). |
| U2Hh | Like U2H, but for the upper part of the spectrum (from 1800 Hz to 8000 Hz) - it rather determines the level of interference. |
| S2H (subharmonic to harmonic) | Like U2Hl, but only half frequencies are considered as non-harmonic components (0.5F0, 1.5F0, 2.5F0,…). This additionally increases the sensitivity of the parameter to the common disorders of the symmetry of the work of the vocal folds. |
| NHR | The parameter defining the ratio of the sum of the amplitudes of the non-harmonic part of the spectrum generated for 4 basic periods to the harmonic part - it mainly determines the noise level. The sum of the harmonic part includes all harmonic frequencies up to 1800 Hz, the sum of the non-harmonic part - from 1800 Hz to 8000 Hz. Due to the calculation method, the parameter is sensitive to the change of the microphone characteristics and may be subject to additional weight compensation. The compensation procedure used does not affect the changes caused by the microphone being placed too closely to the patient's mouth. A safe minimum distance is approximately 10 cm. |
| Yg | Automatically determined Yanagihara coefficient. Continuous value in the range 0–4. |
For each parameter there are available: its average value for the entire analyzed interval and its standard deviation, defining its variability during phonation. Standard deviation is used only in the case of three parameters - F0, A0 and E. In the case of the last two parameters, its determination is advisable even without calibration of the recording track.
Groups of short-term parameters in the diagnostic analysis module of the program DiagnoScope Specialist
The parameters of the acoustic analysis can be grouped into groups due to the counting similarity or a similar function.
The following groups of parameters can be distinguished:
- Parameters measuring the character of the voice (features that are not a direct determinant of pathology): F0, F1, F2, F3, F4 (fundamental frequency and frequency of formants).
- Parameters directly measuring disturbances in the length of basic periods (jitter group): jitter and its derivatives (RAP, PPQ).
- Parameters that directly measure amplitude fluctuations for successive periods: shimmer and its derivatives (APQ).
- Parameters examining changes in the shape of the basic periods: HPQ and its derivatives (HPQh, RHPQ, RHPQh).
- Parameters measuring efficiency in generating an extensive formant structure (depending on many factors, including the dynamics of vocal fold closure): R2H.
- Parameters defining the harmonic structure: U2H and its derivatives (U2HI, U2Hh), S2H, YG, Q.
- Noise parameters: NHR, U2Hh, to a lesser extent YG and Q.
- Phonation stability parameters: standard deviation F0, standard deviation of amplitude and/or energy, no phonation coefficient (NoPhonCoef), phonation break coefficient (BreaksCoef), depth of fundamental frequency modulation (F0ModDepth), energy modulation depth (EModDepth). A subgroup defining voice tremor can be distinguished here (F0ModDepth and EModDepth).
- Performance parameters: Phonation time (PhonTime), True phonation time (TruePhonTime), Performance coefficient (PerfCoef), Average performance (AveragePerf).
Simplified connections between parameters and voice disturbances:
- High noise level (correlated parameters - NHR, U2Hh, HPQh, Yg, Q, jitter group, weaker, but still shimmer);
- Basic periods of uneven length (correlated parameters - jitter group, U2HI, U2H, S2H, weaker HPQ, shimmer group and Yg);
- Differentiation of basic periods in terms of shape (formant intensity) (correlated parameters - shimmer group, HPQ, U2H, weaker jitter);
- Poor formant structure (R2H, NHR).
There is still a long way from linking parameters with disturbances to diagnostics. The most important thing here is to take into account all parameters at the same time - diagnosis cannot be based on an elevated value of one parameter!
There is also an important general diagnostic principle - the rule of three parameters. The fact that the values considered normative were exceeded by parameters from the three groups should be considered disturbing.
Long-term parameters in the performance analysis module of the DiagnoScope program
For the voice efficiency analysis module, parameters such as phonation time, real phonation time or the efficiency factor (Table 2) are characteristic.
| Parameter | Description |
|---|---|
| Phonation time (PhonTime) | Total length of all time intervals marked as containing pho-nation in the "Analysis range" stage (lower graph); in the current version it is always one interval. |
| True phonation time (TruePhonTime) | The cumulative length of all base periods contained within the intervals marked as containing phonations for which the Voiced value is not less than the minimum value set in the "Analysis range" step (horizontal line in the lower graph). |
| Non-phonation coefficient (NoPhonCoef) | The ratio of the total length of the basic periods marked as phonation, having a Voiced value below the minimum, to the phonation time. |
| Phonation breaks coefficient (BreaksCoef) | The ratio of the number of phonation breaks, i.e. continuous intervals with Voiced below the minimum within the intervals marked as phonation, to half of the total number of elementary periods (i.e. the largest possible number of pauses). |
| Voice performance coefficient (PerfCoef) | Numerical parameter dependent on the voice quality ex-pressed by the values of three short-term parameters (Jitter, U2H, NHR) during the actual phonation and on the phonation time (the value is higher, when voice is "better" and when phonation is longer). For each basic period for which the value of the Voiced parameter is not lower than the minimum, the instantaneous value of short-term parameters in relation to the standard is determined. |
| Average Performance (AveragePerf) | The average performance divided by the real phonation time is a measure of the average voice quality expressed by the values of three short-term parameters (Jitter, U2H, NHR) in the whole range of real phonation. |
| Fundamental frequency standard deviation (F0StDev) | Standard deviation of the F0 parameter determined using all basic periods for which the value of the Voiced parameter is not less than the minimum value. |
| Energy standard deviation (EStDev) | Standard deviation of the E parameter determined using all basic periods for which the value of the Voiced parameter is not less than the minimum value. |
| Fundamental frequency modulation depth (F0ModDepth) | The frequency of the highest component of the spectrum of the F0 parameter in the range from 1 Hz to 20 Hz, deter-mined together for the phonation ranges. |
| Energy modulation depth (EModDepth) | The value of the largest component of the spectrum of the E parameter in the range from 1 Hz to 20 Hz, divided by the average value of the E parameter (constant component), determined together for the phonation ranges. |
Specific parameters determined in the speech segment analysis module of the DiagnoScope Specialist
The singing voice analysis module displays all known acoustic parameters and additional specific parameters determined only in the analysis of speech disfluency (Table 3).
| Parameter | Description |
|---|---|
| Average segment length | Average length of all analyzed segments in the sample; described in ms. |
| Minimum segment length | Shortest segment length of all analyzed segments in the sample; described in ms. |
| Maximum segment length | Longest segment length of all analyzed segments in the sample; described in ms. |
| Std. dev. Of segment length | Standard deviation of the segment length determined after all analyzed segments; described in ms. |
| Average distance between segment starts | Average length of interval between segments, that is, the length measured between the beginning of one segment and the beginning of the next; described in ms. |
| Minimum distance between segment starts | The shortest length between segments, that is, the shortest length measured between the beginning of one segment and the beginning of the next; described in ms. |
| Maximum distance between segment starts | The longest length between segments, that is, the shortest length measured between the beginning of one segment and the beginning of the next; described in ms. |
| Std. dev. of distance between segment starts | The standard deviation of the length of the interval between the segment, that is, the length measured between the beginning of one segment and the beginning of the next, determined after all analyzed distances between segments; described in ms. |
| Jitter of distance between segment starts | Short-term deviation; the relative difference in the length of adjacent spacing between segments, that is, adjacent lengths between the beginning of one segment and the beginning of the next, averaged over the length of adjacent intervals between segments; described in %. |
Singers parameters determined in DiagnoScope Specialist program
The module for singing voice analysis displays all known acoustic parameters and parameters characteristic only for singers analysis as a function of the fundamental frequency F0 (Table 4).
| Parameter | Description |
|---|---|
| MonotonicKeyF0 | F0 profile converted to be monotonic (increasing or decreasing). |
| F0Diff | The difference between the instantaneous F0 value and the F0 profile. |
| F0DiffAbs | The absolute value of F0Diff. |
| F0DiffVibr | The difference between the instantaneous F0 value and the F0 profile with an additional algorithm that removes the vibrato from the F0 waveform (sinusoidal shape by default). |
| F0DiffVibrAbs | The absolute value of F0DiffVibr. |
| fc | Describes the tone brightness; spectral centroid; weighted average, in which the amplitude values for the samples of the spectrum are weights for the averaged frequency values. |
| fc/f0 | Describes the tone brightness of the sound color; ratio of fc to fundamental tone; dimensionless quantity. |
The Long-Term Average Spectrum (LTAS) is a type of spectrum that allows you to get an indication of how much of the total energy is being transferred in a specific frequency band. This long term spectrum is a function of the average signal strength as a function of frequency.
| Parameter | Description |
|---|---|
| SPR (Singing Power Ratio) | The ratio of the energy of the highest peak in the band 2-4 kHz to the highest peak in the band 0-2 kHz (the higher the SPR, the better trained the voice). |
| ER (Energy Ratio) | The ratio of the spectral energy in the 2-4 kHz band to the spectral energy in the 0-2 kHz band. |
| α-1 | The ratio of the spectral energy in the 1-6 kHz band to the spectral energy in the 0-1 kHz band. |
| α-2 | The ratio of the spectral energy in the 2-6 kHz band to the spectral energy in the 0-2 kHz band. |