DOI QR코드

DOI QR Code

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser (Biomedical Engineering Department, Amirkabir University of Technology) ;
  • Almasganj, Farshad (Biomedical Engineering Department, Amirkabir University of Technology)
  • Received : 2012.01.31
  • Accepted : 2012.07.09
  • Published : 2013.02.01

Abstract

In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Keywords

References

  1. X. Liu, Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition, doctoral dissertation, Cambridge University Engineering Department, Cambridge, England, UK, 2005.
  2. Y. Tang and R. Rose, "A Study of Using Locality Preserving Projections for Feature Extraction in Speech Recognition," Proc. ICASSP, 2008, pp. 1569-1572.
  3. H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," J. Acoustical Soc. America, vol. 87, no. 4, 1990, pp. 1738-1752. https://doi.org/10.1121/1.399423
  4. A. Errity, J. McKenna, and B. Kirkpatrick, "Dimensionality Reduction Methods Applied to Both Magnitude and Phase Derived Features," Proc. Interspeech, 2007, pp. 1957-1960.
  5. I. Kokkinos and P. Maragos, "Nonlinear Speech Analysis Using Models for Chaotic Systems," IEEE Trans. Speech Audio Process., vol. 13, no. 6, 2005, pp. 1098-1109. https://doi.org/10.1109/TSA.2005.852982
  6. J.J. Jiang, Y. Zhang, and C. McGilligan, "Chaos in Voice, from Modeling to Measurement," J. Voice, vol. 20, 2006, pp. 2-17. https://doi.org/10.1016/j.jvoice.2005.01.001
  7. H. Whitney, "Differentiable Manifolds," Annals Math., 2nd series, vol. 37, 1936, pp. 645-680. https://doi.org/10.2307/1968482
  8. F. Takens, "Detecting Strange Attractors in Turbulence," Proc. Dynamical Syst. Turbulence, 1980, pp. 366-381.
  9. H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge, England, UK: Cambridge University Press, 1997.
  10. A. Ezeiza et al., "Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition," Proc. NOLISP, 2011, pp. 183-189.
  11. V. Pitsikalis, I. Kokkinos, and P. Maragos, "Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents," Proc. Eurospeech, 2003.
  12. S. Prasad et al., "Nonlinear Dynamical Invariants for Speech Recognition," Proc. Int. Conf. Spoken Language Process., 2006, pp. 2518-2521.
  13. S. Yu, D. Zheng, and X. Feng, "A New Time-Domain Feature Parameter for Phoneme Classification," Proc. WESPAC IX, 2006.
  14. M.T. Johnson et al., "Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces," IEEE Trans. Speech Audio Process., vol. 13, no. 4, 2005, pp. 458-466. https://doi.org/10.1109/TSA.2005.848885
  15. R.J. Povinelli et al., "Statistical Models of Reconstructed Phase Spaces for Signal Classification," IEEE Trans. Signal Process., vol. 54, no. 6, 2006, pp. 2178-2186. https://doi.org/10.1109/TSP.2006.873479
  16. A. Jafari, F. Almasganj, and M. NabiBidhendi, "Statistical Modeling of Speech Poincaré Sections in Combination of Frequency Analysis to Improve Speech Recognition Performance," Chaos, vol. 20, 2010, pp. 033106:1-11.
  17. J. Sun, N. Zheng, and X. Wang, "Enhancement of Chinese Speech Based on Nonlinear Dynamics," Signal Process., vol. 87, no. 1, 2007, pp. 2431-2445. https://doi.org/10.1016/j.sigpro.2007.03.020
  18. Y. Shekofteh and F. Almasganj, "Using Phase Space Based Processing to Extract Proper Features for ASR Systems," Proc. 5th Int. Symp. Telecommun., 2010, pp. 596-599.
  19. A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Speech Recognition Using Reconstructed Phase Space Features," Proc. IEEE Int. Conf. Acoustics Speech Signal Process., 2003, pp. 61-63.
  20. A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Joint Frequency Domain and Reconstructed Phase Space Features for Speech Recognition," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2004, pp. 533-536.
  21. J. Ye, M.T. Johnson, and R.J. Povinelli, "Phoneme Classification over Reconstructed Phase Space Using Principal Component Analysis," Proc. NOLISP, 2003, pp. 11-16.
  22. FARSDAT (Farsi Speech Database). Available: http://catalog. elra.info/product_info.php?products_id=18
  23. S. Young et al., The HTK Book, Version 3.4, Cambridge University Engineering Department, Cambridge, England, UK, 2006. Available: http://htk.eng.cam.ac.uk
  24. Y. Shekofteh, F. Almasganj, and M.M. Goodarzi, "Comparison of Linear Based Feature Transformations to Improve Speech Recognition Performance," Proc. ICEE, 2011, pp. 1-4.
  25. C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, article 27.
  26. C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. Neural Netw., vol. 13, no. 2, 2002, pp. 415-425. https://doi.org/10.1109/72.991427
  27. F. Grezl and M. Karafiat, "Integrating Recent MLP Feature Extraction Techniques into TRAP Architecture," Proc. Interspeech, 2011, pp. 1229-1232.

Cited by

  1. Cost Function Based on Gaussian Mixture Model for Parameter Estimation of a Chaotic Circuit with a Hidden Attractor vol.24, pp.1, 2013, https://doi.org/10.1142/s0218127414500102
  2. Are Chaotic Models of EEG Signals Useful in Diagnosing Attention-Deficit/Hyperactivity Disorder? vol.45, pp.1, 2013, https://doi.org/10.1177/1550059413502019
  3. Cost function based on hidden Markov models for parameter estimation of chaotic systems vol.23, pp.13, 2013, https://doi.org/10.1007/s00500-018-3129-6
  4. Parameter Identification of Chaotic Systems Using a Modified Cost Function Including Static and Dynamic Information of Attractors in the State Space vol.38, pp.5, 2013, https://doi.org/10.1007/s00034-018-0967-5
  5. Speaker Recognition using Random Forest vol.37, pp.None, 2013, https://doi.org/10.1051/itmconf/20213701022