Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Pitch tracking algorithm based on evolutionary computing with regularisation in very low SNR

The authors present PTEAR_VLSNR (Pitch Tracking basing on Evolutionary Algorithm with Regularization at Very Low SNR), a pitch tracking algorithm for speech in strong noise. The algorithm builds a pitch enhancement and extraction model, which enhance the pitch by a matched filter, and to further deal with strong noise, the optimal factor was proposed, which can be optimised globally by the evolutionary computing. Specially, regularisation constraint of fitness function was applied to enhance the generalisation ability. Temporal dynamics constraints are used to improve the tracking rate and the voicing decision can be optimal by evolutionary computing similarly. In addition, the balance of optimisation accuracy and time cost were considered. In experiments, genetic algorithm and particle swarm optimisation with two-norm term were represented as evolutionary algorithms with regularisation. At last, they compare the performance of the algorithm and other representative algorithms. The experimental results show that this proposed algorithm performs well in both high and low signal-to-noise ratios (SNRs).

References

    1. 1)
      • 14. Talkin, D.: ‘A robust algorithm for pitch tracking (RAPT)’, in Kleijin, W.B., Paliwal, K.K., (Eds.): ‘Speech coding and synthesis’ (Elsevier, Amsterdam, The Netherlands, 1995), pp. 495518.
    2. 2)
      • 19. Kennedy, J., Eberhart, R.C.: ‘Particle swarm optimization’. Proc. of IEEE Int. Conf. on Neural Networks, Piscat away, NJ, 1995, pp. 19421948.
    3. 3)
      • 15. de Chevegne, A., Kawahara, H.: ‘YIN, a fundamental frequency estimator for speech and music’, J. Acoust. Soc. Am., 2002, 111, (4), pp. 19171930.
    4. 4)
      • 22. He, Y.-H., Hui, C.-W.: ‘A binary coding genetic algorithm for multi-purpose process scheduling: a case study’, Chem. Eng. Sci., 2010, 65, (16), pp. 48164828.
    5. 5)
      • 20. Parsopoulos, K.E., Vrahat, I.N.: ‘On the computation of all global minimizers through particle swarm optimization’, IEEE Trans. Evol. Comput., 2004, 8, (3), pp. 211224.
    6. 6)
      • 12. Jin, Z., Wang, D.L.: ‘HMM-based multipitch tracing for noisy and reverberant speech’, IEEE/ACM Trans. Audio, Speech Lang. Process., 2011, 19, (5), pp. 10911102.
    7. 7)
      • 2. Wohlmayr, M., Pernkopf, F.: ‘Model-based multiple pitch tracking using factorial HMMs: model adaptation and inference’, IEEE Trans. Audio, Speech, Lang. Process., 2013, 21, (8), pp. 17421754.
    8. 8)
      • 3. Han, K., Wang, D.L.: ‘A classification based approach to speech segregation’, J. Acoust. Soc. Am., 2012, 132, (5), pp. 34753483.
    9. 9)
      • 13. Gonzalez, S., Brookes, M.: ‘PEFAC-A pitch estimation algorithm robust to high levels of noise’, IEEE/ACM Trans. Audio, Speech Lang. Process., 2014, 22, (2), pp. 518530.
    10. 10)
      • 16. Shahnaz, C., Zhu, W.-P., Ahmad, M.O.: ‘Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme’, IEEE Trans. Audio, Speech Lang. Process., 2012, 20, (1), pp. 322335.
    11. 11)
      • 18. McLachlan, G., Peel, D.: ‘Finite mixture models’ (Wiley-Blackwell, New York, NY, USA, November 2000).
    12. 12)
      • 21. Holland, J.H.: ‘Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence’ (MIT Press, Cambridge, 1992, 2nd edn.).
    13. 13)
      • 17. Huang, F., Lee, T.: ‘Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique’, IEEE Trans. Audio, Speech Lang. Process., 2013, 21, (1), pp. 99109.
    14. 14)
      • 1. Ramakrishnan, A.G., Abhiram, B., Mahadeva Prasanna, S.R.: ‘Voice source characterization using pitch synchronous discrete cosine transform for speaker identification’, J. Acoust. Soc. Am., 2015, 137, (6), pp. 469475.
    15. 15)
      • 8. Shahnaz, C., Zhu, W.-P., Ahmad, M.O.: ‘A robust pitch estimation algorithm in noise’. Proc. IEEE Int. Conf. Acoustical, Speech, Signal Processing (ICASSP), Honolulu, HI, 2007, vol. 4, pp. 10731076.
    16. 16)
      • 9. Wu, M., Wang, D.L., Brown, G.J.: ‘A multipitch tracking algorithm for noisy speech’, Trans. Speech Audio Process., 2003, 11, (3), pp. 229241.
    17. 17)
      • 6. Shimamura, T., Kobayashi, H.: ‘Weighted autocorrelation for pitch extraction of noisy speech’, IEEE Trans. Speech Audio Process., 2001, 9, (7), pp. 727730.
    18. 18)
      • 4. Rao, K.S., Maity, S., Reddy, VR.: ‘Pitch synchronous and glottal closure based speech analysis for language recognition’, Int. J. Speech Technol., 2013, 16, (4), pp. 413430.
    19. 19)
      • 7. Shahnaz, C., Zhu, W.-P., Ahmad, M.O.: ‘Robust pitch estimation at very low SNR exploiting time and frequency domain cues’. Proc. IEEE Int. Conf. Acoustical, Speech, Signal Processing (ICASSP), Philadelphia, PA, 2005, vol. 1, pp. 389392.
    20. 20)
      • 5. Sharma, D., Naylor, P.A.: ‘Evaluation of pitch estimation in noisy speech for application in non-intrusive speech quality assessment’. Proc. Eur. Signal Processing Conf. (EUSIPCO), Glasgow, UK, August 2009.
    21. 21)
      • 11. Gosain, A., Sharma, G.: ‘A survey of dynamic program analysis techniques and tools’, Adv. Intell. Syst. Comput., 2014, 327, pp. 113122.
    22. 22)
      • 10. Han, K., Wang, D.L.: ‘Neural network based pitch tracking in very noisy speech’, IEEE Trans. Audio, Speech Lang. Process., 2014, 22, (12), pp. 21582168.
http://iet.metastore.ingenta.com/content/journals/10.1049/joe.2018.8290
Loading

Related content

content/journals/10.1049/joe.2018.8290
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address