Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Extraction of acoustic features based on auditory spike code and its application to music genre classification

A new method of extracting acoustic features based on auditory spike code is proposed. An auditory spike code represents the acoustic activities created by the signal, similar to sound encoding of the human auditory system. In the proposed method, an auditory spike code of the signal is computed using a 64-band Gammatone filterbank as the kernel functions. Then, for each spectral band, the sum and non-zero counts of the auditory spike code are determined, and the features corresponding to the population and occurrence rate of the acoustic activities for each band are computed. In addition, the distribution of the acoustic activities on a time axis is analysed based on the histogram of time intervals between the adjacent acoustic activities, and the features for expressing temporal properties of the signal are extracted. The reconstruction accuracy of the auditory spike code is also measured as the features. Different from most conventional features obtained by complex statistical modelling or learning, the features by the proposed method can directly show specific acoustic characteristics contained in the signal. These features are applied to a music genre classification, and it is confirmed that they provide a performance comparable to state-of-the-art features.

References

    1. 1)
      • 10. Panagakis, Y., Kotropoulos, C., Arce, G.R.: ‘Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (3), pp. 576588.
    2. 2)
      • 2. El Ayadi, M., Kamel, M.S., Karray, F.: ‘Survey on speech emotion recognition: features, classification schemes, and databases’, Pattern Recognit., 2011, 44, (3), pp. 572587.
    3. 3)
      • 14. Sigtia, S., Dixon, S.: ‘Improved music feature learning with deep neural networks’. Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 69596963.
    4. 4)
      • 15. Lee, C.-H., Lin, H.-S., Chen, L.-H.: ‘Music classification using the bag of words model of modulation spectral features’. Proc. Int. Symp. Communications and Information Technologies (ISCIT2015), Nara, Japan, October 2015, pp. 121124.
    5. 5)
      • 19. Smith, E.C., Lewicki, M.S.: ‘Efficient auditory coding’, Nature, 2006, 439, (7079), pp. 978982.
    6. 6)
      • 20. Patterson, R.D., Allerhand, M., Giguere, C.: ‘Time-domain modelling of peripheral auditory processing: a modular architecture and a software platform’, J. Acoust. Soc. Am., 1995, 98, (4), pp. 18901894.
    7. 7)
      • 7. Lee, C.-H., Shih, J.-L., Yu, K.-M., et al: ‘Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features’, IEEE Trans. Multimedia, 2009, 11, (4), pp. 670682.
    8. 8)
      • 9. Hamel, P., Eck, D.: ‘Learning features from music audio with deep belief networks’. Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), Utrecht, Netherlands, August 2010, pp. 339344.
    9. 9)
      • 22. LeCun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, (7553), pp. 436444.
    10. 10)
      • 18. Mohamed, A.-R., Dahl, G.E., Hinton, G.: ‘Acoustic modeling using deep belief networks’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (1), pp. 1422.
    11. 11)
      • 3. Tzanetakis, G., Cook, P.: ‘Musical genre classification of audio signals’, IEEE Trans. Speech Audio Process., 2002, 10, (5), pp. 293302.
    12. 12)
      • 16. Yang, X., Wang, K., Shamma, S.A.: ‘Auditory representations of acoustic signals’, IEEE Trans. Inf. Theory, 1992, 38, (2), pp. 824839.
    13. 13)
      • 13. Anden, J., Mallat, S.: ‘Deep scattering spectrum’, IEEE Trans. Signal Process., 2014, 62, (16), pp. 41144128.
    14. 14)
      • 1. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., et al: ‘Convolutional neural networks for speech recognition’, IEEE/ACM Trans. Audio Speech Lang. Process., 2014, 22, (10), pp. 15331545.
    15. 15)
      • 4. ISMIR 2004 audio description contest’, http://ismir2004.ismir.net/ISMIR_Contest.html.
    16. 16)
      • 17. Lee, H., Ekanadham, C., Ng, A.Y.: ‘Sparse deep belief net model for visual area V2’. Proc. Neural Information Processing Systems (NIPS2007), Vancouver, Canada, 2007, pp. 873880.
    17. 17)
      • 21. Smith, E.C., Lewicki, M.S.: ‘Efficient coding of time-relative structure using spikes’, Neural Comput., 2005, 17, (1), pp. 1945.
    18. 18)
      • 6. Manzagol, P., Bertin-Mahieux, T., Eck, D.: ‘On the use of sparse time-relative auditory codes for music’. Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), Philadelphia, USA, September 2008, pp. 603608.
    19. 19)
      • 12. Henaff, M., Jarrett, K., Kavukcuoglu, K., et al: ‘Unsupervised learning of sparse features for scalable audio classification’. Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), Miami, USA, October 2011, pp. 681686.
    20. 20)
      • 8. Panagakis, Y., Kotropoulos, C., Arce, G.R.: ‘Music genre classification via sparse representations of auditory temporal modulations’. Proc. European Signal Processing Conf., Glasgow, Scotland, 2009.
    21. 21)
      • 5. Bergstra, J., Casagrande, N., Erhan, D., et al: ‘Aggregate features and ADABOOST for music classification’, Mach. Learn., 2006, 65, (2–3), pp. 473484.
    22. 22)
      • 11. Chang, K.K., Jang, J.-S.R., Iliopoulos, C.S.: ‘Music genre classification via compressive sampling’. Proc. Int. Soc. for Music Information Retrieval Conf.(ISMIR), Utrecht, Netherlands, August 2010, pp. 387392.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2018.5158
Loading

Related content

content/journals/10.1049/iet-spr.2018.5158
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address