© The Institution of Engineering and Technology
The authors present an algorithm for pitch estimation including voiced/unvoiced decision in the case of a noisy speech and when two speakers are talking simultaneously. The approach is based on the spectral multi-scale product (SMP) analysis of the sound mixture. SMP is the spectrum of the product of three successive wavelet transform coefficients of the speech. The wavelet used for SMP analysis is the quadratic spline function. The proposed method is compared with other state-of-the-art algorithms. It is robust in the presence of a noise and permits the pitch estimation of the dominant speech and the concurrent one from the sound mixture with high accuracy.
References
-
-
1)
-
T. Nakatani ,
T. Irino
.
Robust and accurate fundamental frequency estimation based on dominant harmonic components.
J. Acoust. Soc. Am.
,
6 ,
3690 -
3700
-
2)
-
Plante, F., Meyer, G., Ainsworth, W.A.: `A pitch extraction reference database', Fourth European Conf. on Speech Communication and Technology, EUROSPEECH 95, September 1995, Madrid, Spain, p. 837–840.
-
3)
-
Cooke, M.P.: `Modeling auditory processing and organisation', 1993, PhD, University of Sheffield.
-
4)
-
C.S. Burrus ,
R.A. Gopinath ,
H. Guo
.
(1998)
Introduction to wavelets and wavelet transform: a primer.
-
5)
-
A. Klapuri
.
Multiple fundamental frequency estimation based on harmonicity and spectral smoothness.
IEEE Trans. Speech Audio Process.
,
6 ,
804 -
815
-
6)
-
W.J. Hess
.
(1983)
Pitch determination of speech signals: algorithms and devices.
-
7)
-
S. Kadambe ,
G.F. Boudrcaux-Bartels
.
Application of the wavelet transform for pitch determination of speech signals.
IEEE Trans. Inf. Theory
,
917 -
924
-
8)
-
A. Bouzid ,
N. Ellouze
.
(2007)
Open quotient measurements based on multi-scale product of speech signal wavelet transform.
-
9)
-
Gu, Y.H., van Bokhoven, W.M.G.: `Co-channel speech separation using frequency bin nonlinear adaptive filter', Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 1991, Toronto, Ontario, Canada, p. 949–952.
-
10)
-
Saito, S., Kameoka, H., Nishimoto, T., Sagayama, S.: `Specmurt analysis of multi-pitch music signals with adaptive estimation of common harmonic structure', Proc. Int. Conf. Music, Information, Retrieval, September 2005, London, UK.
-
11)
-
Davy, M., Godsill, S.: `Bayesian harmonic models for musical signal analysis', Proc. Seventh Valencia Int. Meeting in Bayesian Statistics, 2003, Valencia, Spain.
-
12)
-
T. Tolonen ,
M. Karjalainen
.
A computationally efficient multi-pitch analysis model.
IEEE Trans. Speech Audio Process.
,
6 ,
708 -
716
-
13)
-
B.M. Sadler ,
A. Swami
.
Analysis of multi-scale products for step detection and estimation.
IEEE Trans. Inf. Theory
,
1043 -
1051
-
14)
-
A. De Cheveigné ,
D.L. Wang ,
G.J. Brown
.
(2006)
Multiple , Computational auditory scene analysis: principles, algorithms and applications.
-
15)
-
Y. Xu ,
J.B. Weaver ,
D.M. Healy ,
Lu ,
Lu. Jian
.
Wavelet transform domain filters: a spatially selective noise filtration technique.
IEEE Trans. Image Process.
,
6 ,
747 -
757
-
16)
-
Boersma, P.: `Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound', Proc. Institute of Phonetic Sciences, 1993, Amsterdam, p. 97–110.
-
17)
-
A. Bouzid ,
N. Ellouze
.
Electroglottographic measures based on GCI and GOI detection using multi-scale product.
Int. J. Comput. Commun. Control
,
21 -
32
-
18)
-
Z. Berman ,
J.S. Baras
.
Properties of the multi-scale maxima and zero-crossings representations.
IEEE Trans. Signal Process.
,
3216 -
3231
-
19)
-
A. De Cheveigné
.
Separation of concurrent harmonic sounds: fundamental frequency estimation and a time domain cancellation model of auditory processing.
J. Acoust. Soc. Am.
,
6 ,
3271 -
3290
-
20)
-
Joho, D., Bennewitz, M., Behnke, S.: `Pitch estimation using models of voiced speech on three levels', Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, April 2007, Honolulu, Hawaii, USA, p. 1077–1080.
-
21)
-
http://spib.rice.edu/spib/select_noise.html, accessed 15 September 2010.
-
22)
-
A. de Cheveigne ,
H. Kawahara
.
YIN, a fundamental frequency estimator for speech and music.
J. Acoust. Soc. Am.
,
4 ,
1917 -
30
-
23)
-
M. Wu ,
D. Wang ,
G. Brown
.
A multi-pitch tracking algorithm for noisy speech.
IEEE Trans. Speech Audio Process.
,
3 ,
229 -
241
-
24)
-
M.A. Ben Messaoud ,
A. Bouzid ,
N. Ellouze ,
J. Sole-Casals ,
V. Zaiats
.
(2010)
Pitch tracking based on spectral multi-scale product analysis, Advances in nonlinear speech processing.
-
25)
-
L.R. Rabiner ,
M.J. Cheng ,
A.E. Rosenberg ,
C.A. McGonegal
.
A comparative performance study of several pitch detection algorithms.
IEEE Trans. Acoust., Speech, Signal Process.
,
5 ,
399 -
418
-
26)
-
S. Mallat
.
(1998)
A wavelet tour of signal processing.
-
27)
-
A. Klapuri ,
A. Klapuri ,
M. Davy
.
(2005)
Auditory model-based methods for multiple fundamental frequency estimation, Signal processing methods for music transcription.
-
28)
-
Klapuri, A.: `Multiple fundamental frequency estimation by summing harmonic amplitudes', Proc. Int. Conf. on Music, Information, Retrieval, October 2006, Victoria, Canada.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2010.0030
Related content
content/journals/10.1049/iet-spr.2010.0030
pub_keyword,iet_inspecKeyword,pub_concept
6
6