Statistical
Modelling
Recent progress in speech enhancement relies to a large extent on improved statistical modeling. In recent years we have developed estimators based on supergaussian (leptocurtic) probability densities. When the span of correlation of a speech signal is larger than the length of the window used for spectral analysis the spectral coefficients are not necessarily Gaussian distributed. Minimum Mean Square Error estimators as well as soft-decision weighting functions were derived for various combinations of Gaussian, Laplacian, and Gamma densities of speech and noise coefficients. Compared to the well known Gaussian solutions these estimators lead to improved performance.
The two Figures below
show a histogram (shaded areas) of the real part of
DFT coefficients for undisturbed speech and three model
densities. The dotted, the dashed, and the solid lines
depict the Gaussian, the Laplacian, and the Gamma densities,
respectively. The Figure on the right hand side shows
an enlargement in the range of positive DFT values.
We conclude that the Gaussian density does not provide
a good fit to the observed data. Apparently, the speech
data follows a heavy-tailed distribution.
Compared to the Wiener filter, i.e., the Gaussian model, estimators based on supergaussian densities show a number of interesting properties. The Figure below shows the gain function for three different a priori SNR values. For an SNR of 0 dB we observe that the supergaussian model results in less attenuation when the input amplitudes are large. In this case it is highly likely that speech is present and therefore less attenuation is beneficial. For small input amplitudes the supergaussian estimator provide more attenuation and thus a larger noise reduction.
References
Martin, R.: Speech Enhancement based on Minimum Mean Square Error Estimation and Supergaussian Priors, IEEE Trans. Speech and Audio Processing, vol. 13, No. 5, pp. 845-856, 2005
Breithaupt, C; Martin, R.: MMSE Estimation of Magnitude-Squared DFT Coefficients with Supergaussian Priors. Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Processing (ICASSP), 2003.
Martin, R. and Breithaupt, C.: Speech Enhancement in the DFT Domain Using Laplacian Speech Priors, Proc. Intl. Workshop on Acoustic Echo and Noise Control, pp. 87-90, 2003
Martin, R.: Speech Enhancement Using MMSE Short Time Spectral Estimation with Gamma Distributed Speech Priors, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 253-256, 2002