In this study, the authors propose a novel framework for audio source separation based on a cascaded non-negative matrix factorisation (NMF) using homotopy optimisation with perturbation and ensemble (HOPE) and denoising autoencoder. NMF using traditional optimisation has a problem of finding a global solution, and hence could not achieve complete separation of the sources from the mixture. This problem has been addressed using homotopy optimisation in this study. Subsequently, using denoising autoencoder the residual sounds that are usually observed in the separated sources are removed. The enhanced audio signals are filtered using Wiener techniques to obtain the separated signals. The homotopy-based NMF is applied for separating singing voice and drums from song samples using a single-channel mixture. The separated signals are compared with other NMF algorithms by using Blind Source Separation (BSS) Eval objective quality measures. The NMF with HOPE and denoising autoencoder is shown to provide an improvement of up to 6 dB in comparison with other NMF algorithms.

References

1. 1)
  - 62. Bottou, L.: ‘Large-scale machine learning with stochastic gradient descent’. Proc. COMPSTAT’ 2010, Paris, France, 2010, pp. 177–186.
2. 2)
  - 33. Hoyer, P.O.: ‘Non-negative matrix factorization with sparseness constraints’, J. Mach. Learn. Res., 2004, 5, pp. 1457–1469.
3. 3)
  - 29. Comon, P.: ‘Blind identification and source separation in 2/spl times/3 under-determined mixtures’, IEEE Trans. Signal Process., 2004, 52, (1), pp. 11–22.
4. 4)
  - 54. Feng, X., Zhang, Y., Glass, J.: ‘Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition’. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4 May 2014, pp. 1759–1763.
5. 5)
  - 45. Al-Tmeme, A., Woo, W.L., Dlay, S.S., et al: ‘Underdetermined convolutive source separation using GEM-MU with variational approximated optimum model order NMF2D’, IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP), 2017, 25, (1), pp. 35–49.
6. 6)
  - 5. Ozerov, A., Fevotte, C.: ‘Multichannel non-negative matrix factorization in convolutive mixtures for audio source separation’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (3), pp. 550–563.
7. 7)
  - 25. Vincent, E., Bertin, N., Gribonval, R., et al: ‘From blind to guided audio source separation: How models and side information can improve the separation of sound?’, IEEE Signal Process. Mag., 2014, 31, pp. 107–115.
8. 8)
  - 50. Lin, Q., Xiao, L.: ‘An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization’. Int. Conf. Machine Learning, Beijing, China, 27 January 2014, pp. 73–81.
9. 9)
  - 3. Schmidt, M.N., Winther, O., Hansen, L.K.: ‘Bayesian non-negative matrix factorization’. ICA, Berlin, Heidelberg, 15 March 2009, vol. 9, pp. 540–547.
10. 10)
  - 6. Berry, M.W., Browne, M., Langville, A.N., et al: ‘Algorithms and applications for approximate nonnegative matrix factorization’, Comput. Stat. Data Anal., 2007, 52, (1), pp. 155–173.
11. 11)
  - 14. Rafii, Z., Germain, F.G., Sun, D.L., et al: ‘Combining modeling of singing voice and background music for automatic separation of musical mixtures’. ISMIR, Curitiba, PR, Brazil, 4 November 2013, vol. 10, pp. 645–680.
12. 12)
  - 46. Gao, Y.F., Xu, N.: ‘Data processing with combined homotopy methods for a class of non-convex optimization problems’, Adv. Mater. Res., 2014, 1046, pp. 403–406.
13. 13)
  - 32. Virtanen, T.: ‘Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (3), pp. 1066–1074.
14. 14)
  - 64. Emiya, V., Vincent, E., Harlander, N., et al: ‘Subjective and objective quality assessment of audio source separation’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, pp. 2046–2057.
15. 15)
  - 38. Le Roux, J., Weninger, F.J., Hershey, J.R.: ‘Sparse NMF half-baked or well done?’. Technical Report No. TR2015-023, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA, March 2015.
16. 16)
  - 4. Nikunen, J., Diment, A., Virtanen, T.: ‘Separation of moving sound sources using multichannel NMF and acoustic tracking’, IEEE/ACM Trans. Audio Speech Lang. Process., 2017, 26, pp. 281–295.
17. 17)
  - 52. Wu, Y.C., Hwang, H.T., Wang, S.S., et al: ‘A locally linear embedding based post-filtering approach for speech enhancement’. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), New Orleans, Louisiana, United States, 5 March 2017, pp. 5555–5559.
18. 18)
  - 34. Sainath, T.N., Ramabhadran, B., Nahamoo, D., et al: ‘Exemplar-based processing for speech recognition: an overview’, IEEE Signal Process. Mag., 2012, 29, pp. 98–113.
19. 19)
  - 18. Hernando, A., Bobadilla, J., Ortega, F.: ‘A nonnegative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model’, Knowl.-Based Syst., 2016, 97, pp. 188–202.
20. 20)
  - 41. Bryan, N.J., Mysore, G.J.: ‘Interactive refinement of supervised and semi-supervised sound source separation estimates’. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013, pp. 883–887.
21. 21)
  - 60. Sindhwani, V., Bucak, S.S., Hu, J., et al: ‘A family of non-negative matrix factorizations for one-class collaborative filtering problems’. Proc. ACM Recommender Systems Conf., Vancouver, Canada, 2009.
22. 22)
  - 37. Gao, B., Woo, W.L., Dlay, S.S.: ‘Unsupervised single-channel separation of non-stationary signals using gammatone filterbank and Itakura–Saito non-negative matrix two-dimensional factorizations’, IEEE Trans. Circuits Syst. I, Regul. Pap., 2013, 60, (3), pp. 662–675.
23. 23)
  - 28. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC Press, Boca Raton, FL, USA, 2013).
24. 24)
  - 12. Gaussier, E., Goutte, C.: ‘Relation between PLSA and NMF and implications’. Proc. 28th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Salvador, Brazil, 15 August 2005, pp. 601–602.
25. 25)
  - 39. Ozerov, A., Fevotte, C., Charbit, M.: ‘Factorial scaled hidden Markov model for polyphonic audio representation and source separation’. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2009 WASPAA'09, New Paltz, NY, USA, 18 October 2009, pp. 121–124.
26. 26)
  - 19. Xu, W., Liu, X., Gong, Y.: ‘Document clustering based on non-negative matrix factorization’. Proc. 26th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Toronto, Canada, 28 July 2003, pp. 267–273.
27. 27)
  - 48. Watson, L.T., Haftka, R.T.: ‘Modern homotopy methods in optimization’, Comput. Methods Appl. Mech. Eng., 1989, 74, (3), pp. 289–305.
28. 28)
  - 26. Sun, D.L., Mysore, G.J.: ‘Universal speech models for speaker independent single channel source separation’. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 26 May 2013, pp. 141–145.
29. 29)
  - 61. Zdunek, R., Cichocki, A.: ‘Non-negative matrix factorization with quasi-newton optimization’. Int. Conf. Artificial Intelligence and Soft Computing, Zakopane, Poland, 25 June 2006, pp. 870–879.
30. 30)
  - 49. Chen, L., Han, L., Zhou, L.: ‘Computing tensor eigenvalues via homotopy methods’, SIAM J. Matrix Anal. Appl., 2016, 37, (1), pp. 290–319.
31. 31)
  - 30. Tengtrairat, N., Woo, W.L., Dlay, S.S., et al: ‘Online noisy single-channel source separation using adaptive spectrum amplitude estimator and masking’, IEEE Trans. Signal Process., 2016, 64, (7), pp. 1881–1895.
32. 32)
  - 23. Sandler, R., Lindenbaum, M.: ‘Nonnegative matrix factorization with Earth mover's distance metric for image analysis’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (8), pp. 1590–1602.
33. 33)
  - 1. Lee, D.D., Seung, H.S.: ‘Learning the parts of objects by non-negative matrix factorization’, Nature, 1999, 401, p. 788.
34. 34)
  - 11. Virtanen, T., Cemgil, A.T., Godsill, S.: ‘Bayesian extensions to non-negative matrix factorisation for audio signal modelling’. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP) 2008, Las Vegas, USA, 31 March 2008, pp. 1825–1828.
35. 35)
  - 16. Greene, D., Cagney, G., Krogan, N., et al: ‘Ensemble non-negative matrix factorization methods for clustering protein–protein interactions’, Bioinformatics, 2008, 24, (15), pp. 1722–1728.
36. 36)
  - 44. Vandaele, A., Gillis, N., Glineur, F., et al: ‘Heuristics for exact non-negative matrix factorization’, J. Glob. Optim., 2016, 65, (2), pp. 369–400.
37. 37)
  - 42. Luo, X., Zhou, M., Li, S., et al: ‘A non-negative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method’, IEEE Trans. Neural Netw. Learn. Syst., 2016, 27, (3), pp. 579–592.
38. 38)
  - 13. Arberet, S., Ozerov, A., Duong, N.Q., et al: ‘Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation’. 2010 Tenth Int. Conf. Information Sciences Signal Processing and their Applications (ISSPA), Kuala Lumpur, Malaysia, 10 May 2010, pp. 1–4.
39. 39)
  - 51. Chang, K.L., Ahmad, R.B.: ‘Global optimization using homotopy with 2-step predictor–corrector method’. AIP Conf. Proc., Kuala Lumpur, Malaysia, 19 June 2014, vol. 1602, no. 1, pp. 601–607.
40. 40)
  - 53. Liu, D., Smaragdis, P., Kim, M.: ‘Experiments on deep learning for speech denoising’. 15th Annual Conf. Int. Speech Communication Association, Singapore, 2014.
41. 41)
  - 24. Damon, C., Liutkus, A., Gramfort, A., et al: ‘Non-negative matrix factorization for single-channel EEG artifact rejection’. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 26 May 2013, pp. 1177–1181.
42. 42)
  - 59. Global Optimization Techniques. Available at https://www.mat.univie.ac.at/neum/glopt/techniques.html, accessed November 2017.
43. 43)
  - 21. Monga, V., Mihcak, M.K.: ‘Robust image hashing via non-negative matrix factorizations’. 2006 IEEE Int. Conf. Acoustics, Speech and Signal Processing 2006 ICASSP 2006 Proc., Toulouse, France, 14 May 2006, vol. 2, p. II.
44. 44)
  - 63. Ono, N., Rafii, Z., Kitamura, D., et al: ‘The 2015 signal separation evaluation campaign’. Int. Conf. Latent Variable Analysis and Signal Separation (LVA/ICA), Liberec, August 2015, vol. 9237, pp. 387–395.
45. 45)
  - 47. Martin, B., Goldsztejn, A., Granvilliers, L., et al: ‘On continuation methods for non-linear bi-objective optimization: towards a certified interval-based approach’, J. Glob. Optim., 2016, 64, (1), pp. 3–16.
46. 46)
  - 9. Mohammadiha, N., Taghia, J., Leijon, A.: ‘Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions’. 2012 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25 March 2012, pp. 4561–4564.
47. 47)
  - 20. Soukup, D., Bajla, I.: ‘Robust object recognition under partial occlusions using NMF’, Comput. Intell. Neurosci., 2008, 2008, Article ID 857453, doi: 10.1155/2008/857453.
48. 48)
  - 56. Vincent, P., Larochelle, H., Bengio, Y., et al: ‘Extracting and composing robust features with denoising autoencoders’. Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 5 July 2008, pp. 1096–1103.
49. 49)
  - 22. Mairal, J., Bach, F., Ponce, J., et al: ‘Online learning for matrix factorization and sparse coding’, J. Mach. Learn. Res., 2010, 11, pp. 19–60.
50. 50)
  - 2. Dunlavy, D.M., O'leary, D.P., Klimov, D., et al: ‘HOPE: a homotopy optimization method for protein structure prediction’, J. Comput. Biol., 2005, 12, pp. 1275–1288.
51. 51)
  - 17. Ding, C., Li, T., Peng, W.: ‘Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence chi-square statistic, and a hybrid method’. AAAI, Boston, MA, USA, 16 July 2006, vol. 6, no. 42, pp. 137–143.
52. 52)
  - 7. Kim, H., Park, H.: ‘Nonnegative matrix factorization based on alternating non-negativity constrained least squares and active set method’, SIAM J. Matrix Anal. Appl., 2008, 30, (2), pp. 713–730.
53. 53)
  - 8. Balan, A.K., Boyles, L., Welling, M., et al: ‘Statistical optimization of non-negative matrix factorization’. Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 14 June 2011, pp. 128–136.
54. 54)
  - 43. Zhu, Z., Yang, L.: ‘A constraint shifting homotopy method for computing fixed points on non-convex sets’, J. Nonlinear Sci. Appl. (JNSA), 2016, 9, (6), pp. 3850–3857.
55. 55)
  - 27. Comon, P., Jutten, C.: ‘Handbook of blind source separation: independent component analysis and applications’ (Academic Press, Oxford, UK, 2010).
56. 56)
  - 40. Durrieu, J.L., Richard, G., David, B.: ‘Singer melody extraction in polyphonic signals using source separation methods’. IEEE Int. Conf. Acoustics, Speech and Signal Processing 2008 ICASSP 2008, Las Vegas, USA, 31 March 2008, pp. 169–172.
57. 57)
  - 10. Yanez, F., Bach, F.: ‘Primal-dual algorithms for non-negative matrix factorization with the Kullback–Leibler divergence’. arXiv preprint arXiv:1412.1788, 4 December 2014.
58. 58)
  - 55. Lu, X., Tsao, Y., Matsuda, S., et al: ‘Speech enhancement based on deep denoising autoencoder’. Interspeech, Lyon, France, 2013, pp. 436–440.
59. 59)
  - 31. Hennequin, R., Badeau, R., David, B.: ‘NMF with time–frequency activations to model non-stationary audio events’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, (4), pp. 744–753.
60. 60)
  - 15. Yoshii, K., Itoyama, K., Goto, M.: ‘Student's t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation’. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20 March 2016, pp. 51–55.
61. 61)
  - 58. Weise, T.: ‘Global optimization algorithms-theory and application’, vol. 2, (SIAM, Philadelphia, 2009), Self-published.
62. 62)
  - 57. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1984, 32, pp. 1109–1121.
63. 63)
  - 36. Gao, B., Woo, W.L., Ling, B.W.: ‘Machine learning source separation using maximum a posteriori nonnegative matrix factorization’, IEEE Trans. Cybern., 2014, 44, (7), pp. 1169–1179.
64. 64)
  - 35. Parathai, P., Woo, W.L., Dlay, S.S., et al: ‘Single-channel blind separation using L 1-sparse complex non-negative matrix factorization for acoustic signals’, J. Acoust. Soc. Am., 2015, 137, (1), p. EL124–9.

Homotopy optimisation based NMF for audio source separation

References

Related content