Username   Password       Forgot your password?  Forgot your username? 


CASA For Improving Speech Intelligibility in Monaural Speech Separation

Volume 13, Number 3, May 2017 - Paper 2 - pp. 259-263
DOI: 10.23940/ijpe.17.03.p2.259263

M. Dharmalingama and M. C. John Wiselinb

aPRIST University Thanjavur, Tamilnadu, India / Kongunadu College of Engineering and Technology, Trichy, Tamilnadu, India
bDepartment of EEE, Vidya Academy of Science &. Technology, Thrissur, Kerala, India

(Submitted on December 14, 2016; Revised on March 8, 2017; Accepted on March 16, 2017)


Speech separation is the process of separating the target speech and noise from the noisy speech mixture. Speech separation algorithms are useful in improving the quality and intelligibility of the speech. The various traditional speech separation algorithms such as spectral-subtractive algorithms, Wiener filtering, statistical model-based methods and subspace algorithms are mainly focus on improving the speech quality. But there are applications such as mobile communication, air ground communication and hearing aids, needs speech intelligibility than speech quality. In order to satisfy the requirements of intelligibility, this work proposes an algorithm using Computational Auditory Scene Analysis (CASA) and Support Vector Machine (SVM) to separate the noisy speech into target speech and noise and at the same time improves the speech intelligibility. The proposed algorithm decomposes the clean speech and noise into time-frequency units (T-F) and computes the energy from each frame of clean speech and noise to train the SVM. Latter in the testing phase, the trained SVM is used to estimate the binary mask from the energy of the noisy speech based on whether each T-F unit is dominated by speech or noise. The estimated mask by SVM is used to synthesize the speech signal and is presented to normal-hearing listeners with different age groups to measure the performance of the proposed algorithm. The experimental results show substantial improvements in recognition score because the separated speech has reasonable speech intelligibility.


References: 12

1. Brown, G. J. and Wang, D. L.  “Speech Enhancement”, In Benesty, J., Makino, S. and Chen, J. (Eds), „Separation of Speech by Computational Auditory Scene Analysis, Springer, New York, 2005; pp. 371-402.
2. Bregman, A. S. Auditory Scene Analysis, MIT Cambridge.,1955
3. Wang, D. L. and Brown, G. J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley-IEEE Press, Hoboken.2006.
4. Hu, G. and Wang, D. L.  Monaural Speech Segregation based on Pitch Tracking and Amplitude Modulation‟, IEEE Transaction on Neural Networks, 2004;Vol. 15, No. 5, pp. 1135-1150.
5. Boll. S.F.  Suppression of Acoustic Noise in Speech using Spectral Subtraction, IEEE Trans .Acoust.Speech Signal Processing., 1979; 27(2),113-120.
6. Lim,J. Evaluation of a Correlation Subtraction Method for Enhancing Speech Degraded by Additive Noise , IEEE Trans.Aoust.Speech Signal Process.,1978;37(6),471-472.
7. Arehart, K., Hansen, J., Gallant, S., and Kalstein, L., Evaluation of an Auditory Masked Threshold Noise Suppression Algorithm in Normal Hearing and Hearing Impaired Listeners ,Speech Coommun.,2003; 40, 575-592.
8. Soon,I.,Koh,S.,andYeo,C., Improved Noise Suppression Filter using Self Adaptive Estimator of Probability of Speech Absence, Signal Process.,1999;75,151-159.
9. Kim,J.,Kim,S., and Yoo,C.,The Incorporation of Masking Threshold to Sub Space Speech Enhancement , proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ,Hong Kong ,2003;Vol.I,pp.76-79.
10. Rezayee, A. And Gazor, S.  An Adaptive KLT Approach for Speech Enhancement, IEEE Trans.Speech Audio Process, 2001; 9(2), 87-95.
11. Hohmann, V. „Frequency Analysis and Synthesis using a Gammatone Filter Bank‟, Acta Acoustica United with Acustica, 2002; Vol. 88, pp. 433 - 442.
12. Weintraub, M. A Theory and Computational Model of Auditory Monaural Sound Separation, Ph.D. Thesis, Stanford University.1985.
13. Rothauser, E. H., Chapman, W. D., Guttman, N., Hecker, M. H. L, Nordby, K. S., Silbiger, H.R.,Urbanek,G. E. and Weinstock,M. IEEE Recommended Practice for Speech Quality Measurements‟, IEEE Transaction on Audio Electro Acoustics,1969; Vol. 17, pp. 225-246.
14. Noisex-92,


Click here to download the paper.

Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

This site uses encryption for transmitting your passwords.