access icon openaccess Two-stage blind audio source counting and separation of stereo instantaneous mixtures using Bayesian tensor factorisation

In this paper, the authors address the tasks of audio source counting and separation for two-channel instantaneous mixtures. This goal is achieved in two steps. First, a novel scheme is proposed for estimating the number of sources and the corresponding channel intensity difference (CID) values. For this purpose, an angular spectrum is evaluated as a function of the ratio of the magnitude spectrogram of the two channels and the peak locations of that spectrum are obtained. In the second stage, a new approach is developed for extracting the individual source signals exploiting a Bayesian non-parametric modelling. The mean field variational Bayesian approach is applied for inferring the unknown parameters. Classification is then performed on the inferred active CID values to obtain the individual source magnitude spectrograms. This way, the number of spectral components used for modelling each source is found automatically from the data. The Bayesian approach is compared with the standard Kullback–Leibler non-negative tensor factorisation method to illustrate the effectiveness of Bayesian modelling. The performance of the source separation is measured by obtaining the existing metrics for multichannel blind source separation evaluation. The experiments are performed on instantaneous mixtures from the dev2 database.

Inspec keywords: matrix decomposition; Bayes methods; source separation

Other keywords: two-stage blind audio source counting; angular spectrum; Bayesian nonparametric modelling; Bayesian tensor factorisation; stereo instantaneous mixtures separation; channel intensity difference; individual source signals; Kullback-Leibler nonnegative tensor factorisation method; individual source magnitude spectrograms

Subjects: Signal processing and detection; Other topics in statistics; Algebra; Other topics in statistics; Signal processing theory; Algebra

References

    1. 1)
      • 19. Arberet, S., Ozerov, A., Gribonval, R., et al: ‘Blind spectral-GMM estimation for underdetermined instantaneous audio source separation’. Independent Component Analysis and Signal Separation, 2009, pp. 751758.
    2. 2)
      • 33. Vincent, E., Araki, S., Bofill, P.: ‘The 2008 signal separation evaluation campaign: a community-based approach to large-scale evaluation’. Independent Component Analysis and Signal Separation, 2009, pp. 734741.
    3. 3)
      • 16. Nakano, M., Le Roux, J., Kameoka, H., et al: ‘Infinite-state spectrum model for music signal analysis’. 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 19721975.
    4. 4)
    5. 5)
    6. 6)
    7. 7)
      • 5. Smaragdis, P., Brown, J.: ‘Non-negative matrix factorization for polyphonic music transcription’. 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2003, pp. 177180.
    8. 8)
      • 7. Jaiswal, R., FitzGerald, D., Barry, D., et al: ‘Clustering NMF basis functions using shifted NMF for monaural sound source separation’. 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 245248.
    9. 9)
      • 32. Bishop, C.M., et al: ‘Pattern recognition and machine learning’ (Springer, New York, 2006).
    10. 10)
      • 24. Sanchis, J.M., Castells, F., Rieta, J.J.: ‘Convolutive acoustic mixtures approximation to an instantaneous model using a stereo boundary microphone configuration’. Independent Component Analysis and Blind Signal Separation, 2004, pp. 816823.
    11. 11)
      • 10. Cemgil, A.T.: ‘Bayesian inference in non-negative matrix factorisation models’. Technical Report, CUED/F-INFENG/TR.609, University of Cambridge, July 2008.
    12. 12)
    13. 13)
    14. 14)
      • 31. FitzGerald, D., Cranitch, M., Coyle, E.: ‘Non-negative tensor factorisation for sound source separation’ (Dublin Institute of Technology, 2005).
    15. 15)
    16. 16)
      • 17. Nakano, M., Le Roux, J., Kameoka, H., et al: ‘Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model’. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2011, pp. 325328.
    17. 17)
    18. 18)
      • 21. Sandiko, C.M., Magsino, E.R.: ‘A blind source separation of instantaneous acoustic mixtures using natural gradient method’. 2012 IEEE Int. Conf. on Control System, Computing and Engineering (ICCSCE), 2012, pp. 124129.
    19. 19)
      • 14. Mitsufuji, Y., Roebel, A.: ‘Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge’. 2013 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), May 2013, pp. 7175.
    20. 20)
    21. 21)
    22. 22)
      • 26. Mirzaei, S., Van hamme, H., Norouzi, Y.: ‘Bayesian non-parametric matrix factorization for discovering words in spoken utterances’. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2013, pp. 14.
    23. 23)
      • 20. Barkat, B., Sattar, F., Abed-Meraim, K.: ‘Sources separation of instantaneous mixtures using a linear time–frequency representation and vectors clustering’. 2006 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proc., 2006, vol. 3, p. III.
    24. 24)
      • 30. Loesch, B., Yang, B.: ‘Blind source separation based on time–frequency sparseness in the presence of spatial aliasing’. Latent Variable Analysis and Signal Separation, 2010, pp. 18.
    25. 25)
      • 15. Blei, D.M., Cook, P.R., Hoffman, M.: ‘Bayesian nonparametric matrix factorization for recorded music’. Proc. 27th Int. Conf. on Machine Learning (ICML-10), 2010, pp. 439446.
    26. 26)
      • 35. Demix-inst software reference website. Available at https://www.sites.google.com/site/simonarberet/codes/.
    27. 27)
    28. 28)
      • 28. Arberet, S., Gribonval, R., Bimbot, F.: ‘A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture’. Independent Component Analysis and Blind Signal Separation, 2006, pp. 536543.
    29. 29)
      • 12. Févotte, C., Ozerov, A.: ‘Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues’. Exploring Music Contents, 2011, pp. 102115.
    30. 30)
    31. 31)
    32. 32)
    33. 33)
      • 3. Vincent, E.: ‘Complex nonconvex l p norm minimization for underdetermined source separation’. Independent Component Analysis and Signal Separation, 2007, pp. 430437.
    34. 34)
      • 22. Parvaix, M., Girin, L.: ‘Informed source separation of linear instantaneous under-determined audio mixtures by source index embedding. audio, speech, and language processing’, IEEE Trans., 2011.
    35. 35)
      • 9. Virtanen, T., Cemgil, A., Godsill, S.: ‘Bayesian extensions to non-negative matrix factorisation for audio signal modelling’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, March 2008, pp. 18251828.
    36. 36)
      • 13. FitzGerald, D., Cranitch, M., Coyle, E.: ‘Extended nonnegative tensor factorisation models for musical sound source separation’, Comput. Intell. Neurosci., 2008.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2014.0404
Loading

Related content

content/journals/10.1049/iet-spr.2014.0404
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading