Efficient harmonic peak detection of vowel sounds for enhanced voice activity detection
Voice activity detection (VAD) involves discriminating speech segments from background noise and is a critical step in numerous speech-related applications. However, distinguishing speech from noise based on the properties of noise is fallible, because it is difficult to predict and characterise the noise occurring in real life. In this study, the authors instead focus on the intrinsic characteristics of speech. The harmonic peaks of vowel sounds have higher energies than the other spectral components of speech and are the speech features most likely to survive in most cases of severe noise. Therefore, the energy differences between harmonic peaks and other spectral features show promise for enabling robust VAD. To exploit this feature, the harmonic peaks must be accurately located. For this purpose, this study proposes an efficient harmonic peak location detection (HPD) method. Based on extensive experiments conducted in the presence of various noise types and signal-to-noise ratios, we found that VAD with the proposed HPD approach outperforms existing VAD methods and does so with reasonable computational cost and higher robustness.