Extraction of acoustic features based on auditory spike code and its application to music genre classification

Extraction of acoustic features based on auditory spike code and its application to music genre classification

For access to this article, please select a purchase option:

Buy eFirst article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

A new method of extracting acoustic features based on auditory spike code is proposed. An auditory spike code represents the acoustic activities created by the signal, similar to sound encoding of the human auditory system. In the proposed method, an auditory spike code of the signal is computed using a 64-band Gammatone filterbank as the kernel functions. Then, for each spectral band, the sum and non-zero counts of the auditory spike code are determined, and the features corresponding to the population and occurrence rate of the acoustic activities for each band are computed. In addition, the distribution of the acoustic activities on a time axis is analysed based on the histogram of time intervals between the adjacent acoustic activities, and the features for expressing temporal properties of the signal are extracted. The reconstruction accuracy of the auditory spike code is also measured as the features. Different from most conventional features obtained by complex statistical modelling or learning, the features by the proposed method can directly show specific acoustic characteristics contained in the signal. These features are applied to a music genre classification, and it is confirmed that they provide a performance comparable to state-of-the-art features.


    1. 1)
      • 1. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., et al: ‘Convolutional neural networks for speech recognition’, IEEE/ACM Trans. Audio Speech Lang. Process., 2014, 22, (10), pp. 15331545.
    2. 2)
      • 2. El Ayadi, M., Kamel, M.S., Karray, F.: ‘Survey on speech emotion recognition: features, classification schemes, and databases’, Pattern Recognit., 2011, 44, (3), pp. 572587.
    3. 3)
      • 3. Tzanetakis, G., Cook, P.: ‘Musical genre classification of audio signals’, IEEE Trans. Speech Audio Process., 2002, 10, (5), pp. 293302.
    4. 4)
      • 4. ISMIR 2004 audio description contest’,
    5. 5)
      • 5. Bergstra, J., Casagrande, N., Erhan, D., et al: ‘Aggregate features and ADABOOST for music classification’, Mach. Learn., 2006, 65, (2–3), pp. 473484.
    6. 6)
      • 6. Manzagol, P., Bertin-Mahieux, T., Eck, D.: ‘On the use of sparse time-relative auditory codes for music’. Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), Philadelphia, USA, September 2008, pp. 603608.
    7. 7)
      • 7. Lee, C.-H., Shih, J.-L., Yu, K.-M., et al: ‘Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features’, IEEE Trans. Multimedia, 2009, 11, (4), pp. 670682.
    8. 8)
      • 8. Panagakis, Y., Kotropoulos, C., Arce, G.R.: ‘Music genre classification via sparse representations of auditory temporal modulations’. Proc. European Signal Processing Conf., Glasgow, Scotland, 2009.
    9. 9)
      • 9. Hamel, P., Eck, D.: ‘Learning features from music audio with deep belief networks’. Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), Utrecht, Netherlands, August 2010, pp. 339344.
    10. 10)
      • 10. Panagakis, Y., Kotropoulos, C., Arce, G.R.: ‘Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (3), pp. 576588.
    11. 11)
      • 11. Chang, K.K., Jang, J.-S.R., Iliopoulos, C.S.: ‘Music genre classification via compressive sampling’. Proc. Int. Soc. for Music Information Retrieval Conf.(ISMIR), Utrecht, Netherlands, August 2010, pp. 387392.
    12. 12)
      • 12. Henaff, M., Jarrett, K., Kavukcuoglu, K., et al: ‘Unsupervised learning of sparse features for scalable audio classification’. Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), Miami, USA, October 2011, pp. 681686.
    13. 13)
      • 13. Anden, J., Mallat, S.: ‘Deep scattering spectrum’, IEEE Trans. Signal Process., 2014, 62, (16), pp. 41144128.
    14. 14)
      • 14. Sigtia, S., Dixon, S.: ‘Improved music feature learning with deep neural networks’. Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 69596963.
    15. 15)
      • 15. Lee, C.-H., Lin, H.-S., Chen, L.-H.: ‘Music classification using the bag of words model of modulation spectral features’. Proc. Int. Symp. Communications and Information Technologies (ISCIT2015), Nara, Japan, October 2015, pp. 121124.
    16. 16)
      • 16. Yang, X., Wang, K., Shamma, S.A.: ‘Auditory representations of acoustic signals’, IEEE Trans. Inf. Theory, 1992, 38, (2), pp. 824839.
    17. 17)
      • 17. Lee, H., Ekanadham, C., Ng, A.Y.: ‘Sparse deep belief net model for visual area V2’. Proc. Neural Information Processing Systems (NIPS2007), Vancouver, Canada, 2007, pp. 873880.
    18. 18)
      • 18. Mohamed, A.-R., Dahl, G.E., Hinton, G.: ‘Acoustic modeling using deep belief networks’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (1), pp. 1422.
    19. 19)
      • 19. Smith, E.C., Lewicki, M.S.: ‘Efficient auditory coding’, Nature, 2006, 439, (7079), pp. 978982.
    20. 20)
      • 20. Patterson, R.D., Allerhand, M., Giguere, C.: ‘Time-domain modelling of peripheral auditory processing: a modular architecture and a software platform’, J. Acoust. Soc. Am., 1995, 98, (4), pp. 18901894.
    21. 21)
      • 21. Smith, E.C., Lewicki, M.S.: ‘Efficient coding of time-relative structure using spikes’, Neural Comput., 2005, 17, (1), pp. 1945.
    22. 22)
      • 22. LeCun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, (7553), pp. 436444.

Related content

This is a required field
Please enter a valid email address