http://iet.metastore.ingenta.com
1887

Homotopy optimisation based NMF for audio source separation

Homotopy optimisation based NMF for audio source separation

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In this study, the authors propose a novel framework for audio source separation based on a cascaded non-negative matrix factorisation (NMF) using homotopy optimisation with perturbation and ensemble (HOPE) and denoising autoencoder. NMF using traditional optimisation has a problem of finding a global solution, and hence could not achieve complete separation of the sources from the mixture. This problem has been addressed using homotopy optimisation in this study. Subsequently, using denoising autoencoder the residual sounds that are usually observed in the separated sources are removed. The enhanced audio signals are filtered using Wiener techniques to obtain the separated signals. The homotopy-based NMF is applied for separating singing voice and drums from song samples using a single-channel mixture. The separated signals are compared with other NMF algorithms by using Blind Source Separation (BSS) Eval objective quality measures. The NMF with HOPE and denoising autoencoder is shown to provide an improvement of up to 6 dB in comparison with other NMF algorithms.

References

    1. 1)
      • 1. Lee, D.D., Seung, H.S.: ‘Learning the parts of objects by non-negative matrix factorization’, Nature, 1999, 401, p. 788.
    2. 2)
      • 2. Dunlavy, D.M., O'leary, D.P., Klimov, D., et al: ‘HOPE: a homotopy optimization method for protein structure prediction’, J. Comput. Biol., 2005, 12, pp. 12751288.
    3. 3)
      • 3. Schmidt, M.N., Winther, O., Hansen, L.K.: ‘Bayesian non-negative matrix factorization’. ICA, Berlin, Heidelberg, 15 March 2009, vol. 9, pp. 540547.
    4. 4)
      • 4. Nikunen, J., Diment, A., Virtanen, T.: ‘Separation of moving sound sources using multichannel NMF and acoustic tracking’, IEEE/ACM Trans. Audio Speech Lang. Process., 2017, 26, pp. 281295.
    5. 5)
      • 5. Ozerov, A., Fevotte, C.: ‘Multichannel non-negative matrix factorization in convolutive mixtures for audio source separation’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (3), pp. 550563.
    6. 6)
      • 6. Berry, M.W., Browne, M., Langville, A.N., et al: ‘Algorithms and applications for approximate nonnegative matrix factorization’, Comput. Stat. Data Anal., 2007, 52, (1), pp. 155173.
    7. 7)
      • 7. Kim, H., Park, H.: ‘Nonnegative matrix factorization based on alternating non-negativity constrained least squares and active set method’, SIAM J. Matrix Anal. Appl., 2008, 30, (2), pp. 713730.
    8. 8)
      • 8. Balan, A.K., Boyles, L., Welling, M., et al: ‘Statistical optimization of non-negative matrix factorization’. Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 14 June 2011, pp. 128136.
    9. 9)
      • 9. Mohammadiha, N., Taghia, J., Leijon, A.: ‘Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions’. 2012 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25 March 2012, pp. 45614564.
    10. 10)
      • 10. Yanez, F., Bach, F.: ‘Primal-dual algorithms for non-negative matrix factorization with the Kullback–Leibler divergence’. arXiv preprint arXiv:1412.1788, 4 December 2014.
    11. 11)
      • 11. Virtanen, T., Cemgil, A.T., Godsill, S.: ‘Bayesian extensions to non-negative matrix factorisation for audio signal modelling’. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP) 2008, Las Vegas, USA, 31 March 2008, pp. 18251828.
    12. 12)
      • 12. Gaussier, E., Goutte, C.: ‘Relation between PLSA and NMF and implications’. Proc. 28th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Salvador, Brazil, 15 August 2005, pp. 601602.
    13. 13)
      • 13. Arberet, S., Ozerov, A., Duong, N.Q., et al: ‘Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation’. 2010 Tenth Int. Conf. Information Sciences Signal Processing and their Applications (ISSPA), Kuala Lumpur, Malaysia, 10 May 2010, pp. 14.
    14. 14)
      • 14. Rafii, Z., Germain, F.G., Sun, D.L., et al: ‘Combining modeling of singing voice and background music for automatic separation of musical mixtures’. ISMIR, Curitiba, PR, Brazil, 4 November 2013, vol. 10, pp. 645680.
    15. 15)
      • 15. Yoshii, K., Itoyama, K., Goto, M.: ‘Student's t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation’. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20 March 2016, pp. 5155.
    16. 16)
      • 16. Greene, D., Cagney, G., Krogan, N., et al: ‘Ensemble non-negative matrix factorization methods for clustering protein–protein interactions’, Bioinformatics, 2008, 24, (15), pp. 17221728.
    17. 17)
      • 17. Ding, C., Li, T., Peng, W.: ‘Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence chi-square statistic, and a hybrid method’. AAAI, Boston, MA, USA, 16 July 2006, vol. 6, no. 42, pp. 137143.
    18. 18)
      • 18. Hernando, A., Bobadilla, J., Ortega, F.: ‘A nonnegative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model’, Knowl.-Based Syst., 2016, 97, pp. 188202.
    19. 19)
      • 19. Xu, W., Liu, X., Gong, Y.: ‘Document clustering based on non-negative matrix factorization’. Proc. 26th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Toronto, Canada, 28 July 2003, pp. 267273.
    20. 20)
      • 20. Soukup, D., Bajla, I.: ‘Robust object recognition under partial occlusions using NMF’, Comput. Intell. Neurosci., 2008, 2008, Article ID 857453, doi: 10.1155/2008/857453.
    21. 21)
      • 21. Monga, V., Mihcak, M.K.: ‘Robust image hashing via non-negative matrix factorizations’. 2006 IEEE Int. Conf. Acoustics, Speech and Signal Processing 2006 ICASSP 2006 Proc., Toulouse, France, 14 May 2006, vol. 2, p. II.
    22. 22)
      • 22. Mairal, J., Bach, F., Ponce, J., et al: ‘Online learning for matrix factorization and sparse coding’, J. Mach. Learn. Res., 2010, 11, pp. 1960.
    23. 23)
      • 23. Sandler, R., Lindenbaum, M.: ‘Nonnegative matrix factorization with Earth mover's distance metric for image analysis’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (8), pp. 15901602.
    24. 24)
      • 24. Damon, C., Liutkus, A., Gramfort, A., et al: ‘Non-negative matrix factorization for single-channel EEG artifact rejection’. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 26 May 2013, pp. 11771181.
    25. 25)
      • 25. Vincent, E., Bertin, N., Gribonval, R., et al: ‘From blind to guided audio source separation: How models and side information can improve the separation of sound?’, IEEE Signal Process. Mag., 2014, 31, pp. 107115.
    26. 26)
      • 26. Sun, D.L., Mysore, G.J.: ‘Universal speech models for speaker independent single channel source separation’. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 26 May 2013, pp. 141145.
    27. 27)
      • 27. Comon, P., Jutten, C.: ‘Handbook of blind source separation: independent component analysis and applications’ (Academic Press, Oxford, UK, 2010).
    28. 28)
      • 28. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC Press, Boca Raton, FL, USA, 2013).
    29. 29)
      • 29. Comon, P.: ‘Blind identification and source separation in 2/spl times/3 under-determined mixtures’, IEEE Trans. Signal Process., 2004, 52, (1), pp. 1122.
    30. 30)
      • 30. Tengtrairat, N., Woo, W.L., Dlay, S.S., et al: ‘Online noisy single-channel source separation using adaptive spectrum amplitude estimator and masking’, IEEE Trans. Signal Process., 2016, 64, (7), pp. 18811895.
    31. 31)
      • 31. Hennequin, R., Badeau, R., David, B.: ‘NMF with time–frequency activations to model non-stationary audio events’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, (4), pp. 744753.
    32. 32)
      • 32. Virtanen, T.: ‘Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (3), pp. 10661074.
    33. 33)
      • 33. Hoyer, P.O.: ‘Non-negative matrix factorization with sparseness constraints’, J. Mach. Learn. Res., 2004, 5, pp. 14571469.
    34. 34)
      • 34. Sainath, T.N., Ramabhadran, B., Nahamoo, D., et al: ‘Exemplar-based processing for speech recognition: an overview’, IEEE Signal Process. Mag., 2012, 29, pp. 98113.
    35. 35)
      • 35. Parathai, P., Woo, W.L., Dlay, S.S., et al: ‘Single-channel blind separation using L 1-sparse complex non-negative matrix factorization for acoustic signals’, J. Acoust. Soc. Am., 2015, 137, (1), p. EL1249.
    36. 36)
      • 36. Gao, B., Woo, W.L., Ling, B.W.: ‘Machine learning source separation using maximum a posteriori nonnegative matrix factorization’, IEEE Trans. Cybern., 2014, 44, (7), pp. 11691179.
    37. 37)
      • 37. Gao, B., Woo, W.L., Dlay, S.S.: ‘Unsupervised single-channel separation of non-stationary signals using gammatone filterbank and Itakura–Saito non-negative matrix two-dimensional factorizations’, IEEE Trans. Circuits Syst. I, Regul. Pap., 2013, 60, (3), pp. 662675.
    38. 38)
      • 38. Le Roux, J., Weninger, F.J., Hershey, J.R.: ‘Sparse NMF half-baked or well done?’. Technical Report No. TR2015-023, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA, March 2015.
    39. 39)
      • 39. Ozerov, A., Fevotte, C., Charbit, M.: ‘Factorial scaled hidden Markov model for polyphonic audio representation and source separation’. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2009 WASPAA'09, New Paltz, NY, USA, 18 October 2009, pp. 121124.
    40. 40)
      • 40. Durrieu, J.L., Richard, G., David, B.: ‘Singer melody extraction in polyphonic signals using source separation methods’. IEEE Int. Conf. Acoustics, Speech and Signal Processing 2008 ICASSP 2008, Las Vegas, USA, 31 March 2008, pp. 169172.
    41. 41)
      • 41. Bryan, N.J., Mysore, G.J.: ‘Interactive refinement of supervised and semi-supervised sound source separation estimates’. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013, pp. 883887.
    42. 42)
      • 42. Luo, X., Zhou, M., Li, S., et al: ‘A non-negative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method’, IEEE Trans. Neural Netw. Learn. Syst., 2016, 27, (3), pp. 579592.
    43. 43)
      • 43. Zhu, Z., Yang, L.: ‘A constraint shifting homotopy method for computing fixed points on non-convex sets’, J. Nonlinear Sci. Appl. (JNSA), 2016, 9, (6), pp. 38503857.
    44. 44)
      • 44. Vandaele, A., Gillis, N., Glineur, F., et al: ‘Heuristics for exact non-negative matrix factorization’, J. Glob. Optim., 2016, 65, (2), pp. 369400.
    45. 45)
      • 45. Al-Tmeme, A., Woo, W.L., Dlay, S.S., et al: ‘Underdetermined convolutive source separation using GEM-MU with variational approximated optimum model order NMF2D’, IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP), 2017, 25, (1), pp. 3549.
    46. 46)
      • 46. Gao, Y.F., Xu, N.: ‘Data processing with combined homotopy methods for a class of non-convex optimization problems’, Adv. Mater. Res., 2014, 1046, pp. 403406.
    47. 47)
      • 47. Martin, B., Goldsztejn, A., Granvilliers, L., et al: ‘On continuation methods for non-linear bi-objective optimization: towards a certified interval-based approach’, J. Glob. Optim., 2016, 64, (1), pp. 316.
    48. 48)
      • 48. Watson, L.T., Haftka, R.T.: ‘Modern homotopy methods in optimization’, Comput. Methods Appl. Mech. Eng., 1989, 74, (3), pp. 289305.
    49. 49)
      • 49. Chen, L., Han, L., Zhou, L.: ‘Computing tensor eigenvalues via homotopy methods’, SIAM J. Matrix Anal. Appl., 2016, 37, (1), pp. 290319.
    50. 50)
      • 50. Lin, Q., Xiao, L.: ‘An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization’. Int. Conf. Machine Learning, Beijing, China, 27 January 2014, pp. 7381.
    51. 51)
      • 51. Chang, K.L., Ahmad, R.B.: ‘Global optimization using homotopy with 2-step predictor–corrector method’. AIP Conf. Proc., Kuala Lumpur, Malaysia, 19 June 2014, vol. 1602, no. 1, pp. 601607.
    52. 52)
      • 52. Wu, Y.C., Hwang, H.T., Wang, S.S., et al: ‘A locally linear embedding based post-filtering approach for speech enhancement’. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), New Orleans, Louisiana, United States, 5 March 2017, pp. 55555559.
    53. 53)
      • 53. Liu, D., Smaragdis, P., Kim, M.: ‘Experiments on deep learning for speech denoising’. 15th Annual Conf. Int. Speech Communication Association, Singapore, 2014.
    54. 54)
      • 54. Feng, X., Zhang, Y., Glass, J.: ‘Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition’. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4 May 2014, pp. 17591763.
    55. 55)
      • 55. Lu, X., Tsao, Y., Matsuda, S., et al: ‘Speech enhancement based on deep denoising autoencoder’. Interspeech, Lyon, France, 2013, pp. 436440.
    56. 56)
      • 56. Vincent, P., Larochelle, H., Bengio, Y., et al: ‘Extracting and composing robust features with denoising autoencoders’. Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 5 July 2008, pp. 10961103.
    57. 57)
      • 57. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1984, 32, pp. 11091121.
    58. 58)
      • 58. Weise, T.: ‘Global optimization algorithms-theory and application’, vol. 2, (SIAM, Philadelphia, 2009), Self-published.
    59. 59)
      • 59. Global Optimization Techniques. Available at https://www.mat.univie.ac.at/neum/glopt/techniques.html, accessed November 2017.
    60. 60)
      • 60. Sindhwani, V., Bucak, S.S., Hu, J., et al: ‘A family of non-negative matrix factorizations for one-class collaborative filtering problems’. Proc. ACM Recommender Systems Conf., Vancouver, Canada, 2009.
    61. 61)
      • 61. Zdunek, R., Cichocki, A.: ‘Non-negative matrix factorization with quasi-newton optimization’. Int. Conf. Artificial Intelligence and Soft Computing, Zakopane, Poland, 25 June 2006, pp. 870879.
    62. 62)
      • 62. Bottou, L.: ‘Large-scale machine learning with stochastic gradient descent’. Proc. COMPSTAT’ 2010, Paris, France, 2010, pp. 177186.
    63. 63)
      • 63. Ono, N., Rafii, Z., Kitamura, D., et al: ‘The 2015 signal separation evaluation campaign’. Int. Conf. Latent Variable Analysis and Signal Separation (LVA/ICA), Liberec, August 2015, vol. 9237, pp. 387395.
    64. 64)
      • 64. Emiya, V., Vincent, E., Harlander, N., et al: ‘Subjective and objective quality assessment of audio source separation’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, pp. 20462057.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2018.5093
Loading

Related content

content/journals/10.1049/iet-spr.2018.5093
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address