Inverse filter

Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as sound, images, and scientific measurements.[1] For example, with a filter g, an inverse filter h is one such that the sequence of applying g then h to a signal results in the original signal. Software or electronic inverse filters are often used to compensate for the effect of unwanted environmental filtering of signals.

In speech science

[edit]

In all proposed models for the production of human speech, an important variable is the waveform of the airflow, or volume velocity, at the glottis. The glottal volume velocity waveform provides the link between movements of the vocal folds and the acoustical results of such movements, in that the glottis acts approximately as a source of volume velocity. That is, the impedance of the glottis is usually much higher than that of the vocal tract, and so glottal airflow is controlled mostly (but not entirely) by glottal area and subglottal pressure, and not by vocal-tract acoustics. This view of voiced speech production is often referred to as the source-filter model.

A technique for obtaining an estimate of the glottal volume velocity waveform during voiced speech is the “inverse-filtering” of either the radiated acoustic waveform, as measured by a microphone having a good low frequency response, or the volume velocity at the mouth, as measured by a pneumotachograph at the mouth having a linear response, little speech distortion, and a response time of under approximately 1/2 ms. A pneumotachograph having these properties was first described by Rothenberg[2] and termed by him a circumferentially vented mask or CV mask.

As practiced, inverse-filtering is usually limited to non-nasalized or slightly nasalized vowels, and the recorded waveform is passed through an “inverse-filter” having a transfer characteristic that is the inverse of the transfer characteristic of the supraglottal vocal tract configuration at that moment. The transfer characteristic of the supraglottal vocal tract is defined with the input to the vocal tract considered to be the volume velocity at the glottis. For non-nasalized vowels, assuming a high-impedance volume velocity source at the glottis, the transfer function of the vocal tract below about 3000 Hz contains a number of pairs of complex-conjugate poles, more commonly referred to as resonances or formants. Thus, an inverse-filter would have a pair of complex-conjugate zeroes, more commonly referred to as an anti-resonance, for every vocal tract formant in the frequency range of interest.

If the input is from a microphone, and not a CV mask or its equivalent, the inverse filter also must have a pole at zero frequency (an integration operation) to account for the radiation characteristic that connects volume velocity with acoustic pressure. Inverse filtering the output of a CV mask retains the level of zero flow,[2] while inverse filtering a microphone signal does not.

Inverse filtering depends on the source-filter model and a vocal tract filter that is linear system, however, the source and filter need not be independent.

References

[edit]
  1. ^ Sengupta, Nandini; Sahidullah, Md; Saha, Goutam (August 2016). "Lung sound classification using cepstral-based statistical features". Computers in Biology and Medicine. 75 (1): 118–129. doi:10.1016/j.compbiomed.2016.05.013. PMID 27286184.
  2. ^ a b M. Rothenberg, A new inverse-filtering technique for deriving the glottal air flow waveform during voicing, J. Acoust. Soc. Amer., Vol. 53, #6, 1632 - 1645