In this paper, the authors address the tasks of audio source counting and separation for two-channel instantaneous mixtures. This goal is achieved in two steps. First, a novel scheme is proposed for estimating the number of sources and the corresponding channel intensity difference (CID) values. For this purpose, an angular spectrum is evaluated as a function of the ratio of the magnitude spectrogram of the two channels and the peak locations of that spectrum are obtained. In the second stage, a new approach is developed for extracting the individual source signals exploiting a Bayesian non-parametric modelling. The mean field variational Bayesian approach is applied for inferring the unknown parameters. Classification is then performed on the inferred active CID values to obtain the individual source magnitude spectrograms. This way, the number of spectral components used for modelling each source is found automatically from the data. The Bayesian approach is compared with the standard Kullback–Leibler non-negative tensor factorisation method to illustrate the effectiveness of Bayesian modelling. The performance of the source separation is measured by obtaining the existing metrics for multichannel blind source separation evaluation. The experiments are performed on instantaneous mixtures from the dev2 database.

References

1. 1)
  - 19. Arberet, S., Ozerov, A., Gribonval, R., et al: ‘Blind spectral-GMM estimation for underdetermined instantaneous audio source separation’. Independent Component Analysis and Signal Separation, 2009, pp. 751–758.
2. 2)
  - 33. Vincent, E., Araki, S., Bofill, P.: ‘The 2008 signal separation evaluation campaign: a community-based approach to large-scale evaluation’. Independent Component Analysis and Signal Separation, 2009, pp. 734–741.
3. 3)
  - 16. Nakano, M., Le Roux, J., Kameoka, H., et al: ‘Infinite-state spectrum model for music signal analysis’. 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 1972–1975.
4. 4)
  - 11. Ozerov, A., Fevotte, C.: ‘Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (3), pp. 550–563 (doi: 10.1109/TASL.2009.2031510).
5. 5)
  - 23. Sanchis, J., Rieta, J.: ‘Computational cost reduction using coincident boundary microphones for convolutive blind signal separation’, Electron. Lett., 2005, 41, (6), pp. 374–376 (doi: 10.1049/el:20057242).
6. 6)
  - 18. Nagira, K., Otsuka, T., Okuno, H.G.: ‘Nonparametric Bayesian sparse factor analysis for frequency domain blind source separation without permutation ambiguity’, EURASIP J. Audio Speech Music Process., 2013, 2013, (1), pp. 1–14 (doi: 10.1186/1687-4722-2013-4).
7. 7)
  - 5. Smaragdis, P., Brown, J.: ‘Non-negative matrix factorization for polyphonic music transcription’. 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2003, pp. 177–180.
8. 8)
  - 7. Jaiswal, R., FitzGerald, D., Barry, D., et al: ‘Clustering NMF basis functions using shifted NMF for monaural sound source separation’. 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 245–248.
9. 9)
  - 32. Bishop, C.M., et al: ‘Pattern recognition and machine learning’ (Springer, New York, 2006).
10. 10)
  - 24. Sanchis, J.M., Castells, F., Rieta, J.J.: ‘Convolutive acoustic mixtures approximation to an instantaneous model using a stereo boundary microphone configuration’. Independent Component Analysis and Blind Signal Separation, 2004, pp. 816–823.
11. 11)
  - 10. Cemgil, A.T.: ‘Bayesian inference in non-negative matrix factorisation models’. Technical Report, CUED/F-INFENG/TR.609, University of Cambridge, July 2008.
12. 12)
  - 36. Vincent, E., Araki, S., Theis, F., et al: ‘The signal separation evaluation campaign (2007–2010): achievements and remaining challenges’, Signal Process., 2012, 92, (8), pp. 1928–1936 (doi: 10.1016/j.sigpro.2011.10.007).
13. 13)
  - 25. Gunel, B., Hacihabiboglu, H., Kondoz, A.M.: ‘Acoustic source separation of convolutive mixtures based on intensity vector statistics’, IEEE Trans. Audio Speech Lang. Process., 2008, 16, (4), pp. 748–756 (doi: 10.1109/TASL.2008.918967).
14. 14)
  - 31. FitzGerald, D., Cranitch, M., Coyle, E.: ‘Non-negative tensor factorisation for sound source separation’ (Dublin Institute of Technology, 2005).
15. 15)
  - 34. Arberet, S., Gribonval, R., Bimbot, F.: ‘A robust method to count and locate audio sources in a multichannel underdetermined mixture’, IEEE Trans. Signal Process., 2010, 58, (1), pp. 121–133 (doi: 10.1109/TSP.2009.2030854).
16. 16)
  - 17. Nakano, M., Le Roux, J., Kameoka, H., et al: ‘Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model’. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2011, pp. 325–328.
17. 17)
  - 2. Bofill, P., Zibulevsky, M.: ‘Underdetermined blind source separation using sparse representations’, Signal Process., 2001, 81, (11), pp. 2353–2362 (doi: 10.1016/S0165-1684(01)00120-7).
18. 18)
  - 21. Sandiko, C.M., Magsino, E.R.: ‘A blind source separation of instantaneous acoustic mixtures using natural gradient method’. 2012 IEEE Int. Conf. on Control System, Computing and Engineering (ICCSCE), 2012, pp. 124–129.
19. 19)
  - 14. Mitsufuji, Y., Roebel, A.: ‘Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge’. 2013 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), May 2013, pp. 71–75.
20. 20)
  - 6. Virtanen, T.: ‘Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (3), pp. 1066–1074 (doi: 10.1109/TASL.2006.885253).
21. 21)
  - 29. Pavlidi, D., Griffin, A., Puigt, M., et al: ‘Real-time multiple sound source localization and counting using a circular microphone array’, IEEE Trans. Audio Speech Lang. Process., 2013, 21, (10), pp. 2193–2206 (doi: 10.1109/TASL.2013.2272524).
22. 22)
  - 26. Mirzaei, S., Van hamme, H., Norouzi, Y.: ‘Bayesian non-parametric matrix factorization for discovering words in spoken utterances’. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2013, pp. 1–4.
23. 23)
  - 20. Barkat, B., Sattar, F., Abed-Meraim, K.: ‘Sources separation of instantaneous mixtures using a linear time–frequency representation and vectors clustering’. 2006 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proc., 2006, vol. 3, p. III.
24. 24)
  - 30. Loesch, B., Yang, B.: ‘Blind source separation based on time–frequency sparseness in the presence of spatial aliasing’. Latent Variable Analysis and Signal Separation, 2010, pp. 1–8.
25. 25)
  - 15. Blei, D.M., Cook, P.R., Hoffman, M.: ‘Bayesian nonparametric matrix factorization for recorded music’. Proc. 27th Int. Conf. on Machine Learning (ICML-10), 2010, pp. 439–446.
26. 26)
  - 35. Demix-inst software reference website. Available at https://www.sites.google.com/site/simonarberet/codes/.
27. 27)
  - 4. Saab, R., Yilmaz, O., McKeown, M.J., et al: ‘Underdetermined anechoic blind source separation via lq-basis-pursuit’, IEEE Trans. Signal Process., 2007, 55, (8), pp. 4004–4017 (doi: 10.1109/TSP.2007.895998).
28. 28)
  - 28. Arberet, S., Gribonval, R., Bimbot, F.: ‘A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture’. Independent Component Analysis and Blind Signal Separation, 2006, pp. 536–543.
29. 29)
  - 12. Févotte, C., Ozerov, A.: ‘Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues’. Exploring Music Contents, 2011, pp. 102–115.
30. 30)
  - 27. Yilmaz, O., Rickard, S.: ‘Blind separation of speech mixtures via time–frequency masking’, IEEE Trans. Signal Process., 2004, 52, (7), pp. 1830–1847 (doi: 10.1109/TSP.2004.828896).
31. 31)
  - 5. Lee, D.D., Seung, H.S.: ‘Learning the parts of objects by non-negative matrix factorization’, Nature, 1999, 401, (6755), pp. 788–791 (doi: 10.1038/44565).
32. 32)
  - 1. Reju, V.G., Koh, S.N., Soon, I.Y.: ‘Underdetermined convolutive blind source separation via time–frequency masking’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (1), pp. 101–116 (doi: 10.1109/TASL.2009.2024380).
33. 33)
  - 3. Vincent, E.: ‘Complex nonconvex l p norm minimization for underdetermined source separation’. Independent Component Analysis and Signal Separation, 2007, pp. 430–437.
34. 34)
  - 22. Parvaix, M., Girin, L.: ‘Informed source separation of linear instantaneous under-determined audio mixtures by source index embedding. audio, speech, and language processing’, IEEE Trans., 2011.
35. 35)
  - 9. Virtanen, T., Cemgil, A., Godsill, S.: ‘Bayesian extensions to non-negative matrix factorisation for audio signal modelling’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, March 2008, pp. 1825–1828.
36. 36)
  - 13. FitzGerald, D., Cranitch, M., Coyle, E.: ‘Extended nonnegative tensor factorisation models for musical sound source separation’, Comput. Intell. Neurosci., 2008.

Two-stage blind audio source counting and separation of stereo instantaneous mixtures using Bayesian tensor factorisation

References

Related content