Bird and whale species identification using sound images

Loris Nanni; Rafael L. Aguiar; Yandre M.G. Costa; Sheryl Brahnam; Carlos N. Silla Jr.; Ricky L. Brattin; Zhao Zhao

Bird and whale species identification using sound images

View Fulltext

Author(s): Loris Nanni¹ ; Rafael L. Aguiar^{2, 3} ; Yandre M.G. Costa² ; Sheryl Brahnam⁴ ; Carlos N. Silla Jr.³ ; Ricky L. Brattin⁴ ; Zhao Zhao^{5, 6}
- Affiliations: 1: DEI , University of Padua , Italy ;
  2: PCC/DIN , State University of Maringá , Maringá , Brazil ;
  3: PPGIA , Pontifical Catholic University of Paraná , Curitiba , Brazil ;
  4: CIS , Missouri State University , Springfield , USA ;
  5: School of Electronic and Optical Engineering , Nanjing University of Science and Technology , Nanjing 210094 , People's Republic of China ;
  6: Department of Forestry and Natural Resources , Purdue University , West Lafayette IN47907 , USA
Source: Volume 12, Issue 2, March 2018, p. 178 – 184
DOI: 10.1049/iet-cvi.2017.0075 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 31/01/2017, Accepted 14/11/2017, Revised 09/11/2017, Published 27/11/2017

Image identification of animals is mostly centred on identifying them based on their appearance, but there are other ways images can be used to identify animals, including by representing the sounds they make with images. In this study, the authors present a novel and effective approach for automated identification of birds and whales using some of the best texture descriptors in the computer vision literature. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms and from harmonic and percussion images. These images are divided into sub-windows from which sets of texture descriptors are extracted. The experiments reported in this study using a dataset of Bird vocalisations targeted for species recognition and a dataset of right whale calls targeted for whale detection (as well as three well-known benchmarks for music genre classification) demonstrate that the fusion of different texture features enhances performance. The experiments also demonstrate that the fusion of different texture features with audio features is not only comparable with existing audio signal approaches but also statistically improves some of the stand-alone audio features. The code for the experiments will be publicly available at https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0.

References

1. 1)
  - 54. Seyerlehner, K., Schedl, M., Pohle, T., et al: ‘Using block-level features for genre classification, tag classification and music similarity estimation’. 6th Annual Music Information Retrieval Evaluation eXchange (MIREX-2010), Utrecht, The Netherlands, 2010.
2. 2)
  - 36. Nanni, L., Brahnam, S., Lumini, A.: ‘Combining different local binary pattern variants to boost performance’, Expert Syst. Appl., 2011, 38, (5), pp. 6209–6216.
3. 3)
  - 10. Nanni, L., Costa, Y.M.G., Lucio, D.R., et al: ‘Combining visual and acoustic features for audio classification tasks’, Pattern Recognit. Lett., 2017, 88, (March), pp. 49–56.
4. 4)
  - 15. Molnár, C., Kaplan, F., Roy, P., et al: ‘Classification of dog barks: a machine learning approach’, Animal Cogn., 2008, 11, pp. 389–400.
5. 5)
  - 43. Lopes, M.T., Gioppo, L.L., Higushi, T.T., et al: ‘Automatic bird species identification for large number of species’. IEEE Int. Symp. On Multimedia (ISM), 2011.
6. 6)
  - 58. Demšar, J.: ‘Statistical comparisons of classifiers over multiple data sets’, J. Mach. Learn. Res., 2006, 7, pp. 1–30.
7. 7)
  - 50. Wu, M.-J., Chen, Z.-S., Jang, J.-S.R., et al: ‘Combining visual and acoustic features for music genre classification’. Int. Conf. on Machine Learning and Applications, 2011.
8. 8)
  - 55. Panagakis, Y., Kotropoulos, C., Arce, G.R.: ‘Music genre classification using locality preserving non-negative tensor factorization and sparse representations’. 10th Int. Conf. on Music Information Retrieval, 2009, pp. 249–254.
9. 9)
  - 26. Ojala, T., Pietikainen, M., Maeenpaa, T.: ‘Multiresolution gray-scale and rotation invariant texture classification with local binary patterns’, IEEE Trans. Pattern Anal. Mach. Intell., 2002, 24, (7), pp. 971–987.
10. 10)
  - 35. Song, T., Meng, F.: ‘Letrist: locally encoded transform feature histogram for rotation-invariant texture classification’, IEEE Trans. Circuits Syst. Video Technol., 2017, PP, (99).
11. 11)
  - 16. Pachet, F., Zils, A.: ‘Automatic extraction of music descriptors from acoustic signals’. 5th Int. Conf. on Music Information Retrieval (ISMIR), 2004.
12. 12)
  - 12. Giryn, A., Rojewski, M., Somla, K.: ‘About the possibility of sea creature species identification on the basis of applying pattern recognition to echo-sounder signals’. Meeting on Hydroacoustical Methods for the Estimation of Marine Fish Population, 1979, pp. 455–466.
13. 13)
  - 8. Lucio, D.R., Costa, Y.M.G.: ‘Bird species classification using spectrograms’. The XLI Latin American Computing Conf. (CLEI), Arequipa, Peru, 2015.
14. 14)
  - 39. Fagerlund, S.: ‘Bird species recognition using support vector machines’, EURASIP J. Appl. Signal Process., 2007, 2007, pp. 1–8.
15. 15)
  - 53. Pikrakis, A.: ‘Audio latin music genre classification: a MIREX submission based on a deep learning approach to rhythm modelling’, 2013.
16. 16)
  - 28. Zhao, G., Ahonen, T., Matas, J., et al: ‘Rotation-invariant image and video description with local binary pattern features’, IEEE Trans. Image Process., 2012, 21, (4), pp. 1465–1467.
17. 17)
  - 49. Costa, C.H.L., Valle, J.D.Jr., Koerich, A.L.: ‘Automatic classification of audio data’. Int. Conf. on Systems, Man, and Cybernetics, 2004, pp. 562–567.
18. 18)
  - 11. Deuser, L.M., Middleton, D., Plemonset, T.D., et al: ‘On the classification of underwater acoustic signals. II. Experimental applications involving fish’, J. Acoust. Soc. Am., 1979, 65, (2), pp. 444–455.
19. 19)
  - 59. Kuncheva, L.I., Whitaker, C.J.: ‘Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy’, Mach. Learn., 2003, 51, (2), pp. 181–207.
20. 20)
  - 29. Nosaka, R., Suryanto, C.H., Fukui, K.: ‘Rotation invariant co-occurrence among adjacent LBPs’. ACCV Workshops, 2012, pp. 15–25.
21. 21)
  - 30. Nanni, L., Brahnam, S., Lumini, A., et al: ‘Ensemble of local phase quantization variants with ternary encoding’, in ‘Local binary patterns: new variants and applications’ (Springer, Berlin, 2014).
22. 22)
  - 5. Costa, Y.M.G., Oliveira, L.E.S., Koerich, A.L., et al: ‘Music genre recognition using Gabor filters and LPQ texture descriptors’. 18th Iberoamerican Congress on Pattern Recognition, 2013, pp. 67–74.
23. 23)
  - 21. Spaulding, E., Robbins, M., Calupca, T., et al: ‘An autonomous, near-real-time buoy system for automatic detection of North Atlantic right whale calls’. 157th Meeting of the Acoustical Society of America, 2009.
24. 24)
  - 7. Montalvo, A., Costa, Y.M.G., Calvo, J.R.: ‘Language identification using spectrogram texture’, in Cancela, H., Cuadros-Vargas, A., Cuadros-Vargas, E. (Eds.): ‘Progress in pattern recognition, image analysis, computer vision, and applications’ (Springer, Berlin, 2015), pp. 543–550.
25. 25)
  - 31. San Biagio, M., Crocco, M., Cristani, M., et al: ‘Heterogeneous auto-similarities of characteristics (HASC): exploiting relational information for classification’. IEEE Computer Vision (ICCV'13), 2013, pp. 809–816.
26. 26)
  - 13. Chesmore, E.D.: ‘Application of time domain signal coding and artificial neural networks to passive acoustical identification of animals’, Appl. Acoust., 2001, 62, pp. 1359–1374.
27. 27)
  - 38. Schroeder, M.R., Atal, B.S., Hall, J.L.: ‘Optimizing digital speech coders by exploiting masking properties of the human ear’, J. Acoust. Soc. Am., 1979, 66, (6), pp. 1647–1652.
28. 28)
  - 9. Nanni, L., Costa, Y.M.G., Lucio, D.R., et al: ‘Combining visual and acoustic features for bird species classification’. 28th IEEE Int. Conf. on Tools with Artificial Intelligence, 2016.
29. 29)
  - 37. Wang, Q., Li, P., Zhang, L., et al: ‘Towards effective codebookless model for image classification’, Pattern Recognit., 2016, 59, pp. 63–71.
30. 30)
  - 47. Ong, B., Serra, X., Streich, S., et al: ‘ISMIR 2004 audio description contest’ (Music Technology Group-Universitat Pompeu Fabra, Barcelona, Spain, 2006).
31. 31)
  - 22. Fitzgerald, D.: ‘Harmonic/Percussive separation using median filtering’. 13th Int. Conf. on Digital Audio Effects (DAFx-10), Graz, Austria, 2010.
32. 32)
  - 2. Costa, Y.M.G., Oliveira, L.E.S., Koerich, A.L., et al: ‘Music genre recognition using spectrograms’. 18th Int. Conf. on Systems, Signals and Image Processing, 2011, pp. 151–154.
33. 33)
  - 45. Silla, C.N.Jr., Koerich, A.L., Kaestner, C.A.A.: ‘The latin music database’. 9th Int. Conf. on Music Information Retrieval, Philadelphia, USA, 2008, pp. 451–456.
34. 34)
  - 20. Urazghildiiev, I.R., Clark, C.W., Krein, T.P., et al: ‘Detection and recognition of north atlantic right whale contact calls in the presence of ambient noise’, IEEE J. Ocean. Eng., 2009, 34, (3), pp. 358–368.
35. 35)
  - 17. Bardeli, R., Wolff, D., Kurth, F., et al: ‘Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring’, Pattern Recognit. Lett., 2010, 31, pp. 1524–1534.
36. 36)
  - 23. McAfee, B., Raffel, C., Liang, D.: ‘Librosa: audio and music signal analysis in python’. Proc. 14th Python in Science Conf. (SCIPY), Austin, Texas, 2015.
37. 37)
  - 46. Flexer, A.: ‘A closer look on artist filters for musical genre classification’, World, 2007, 19, (122), pp. 16–17.
38. 38)
  - 42. Chou, C.-H., Liu, P.-H.: ‘Bird species recognition by wavelet transformation of a section of birdsong’. Symp. and Workshops on Ubiquitous, Autonomic and Trusted Computing, 2009, pp. 189–193.
39. 39)
  - 40. Lim, S.-C., Lee, J.-S., Jang, S.-J., et al: ‘Music-genre classification system based on spectro-temporal features and feature selection’, IEEE Trans. Consum. Electron., 2012, 58, (4), pp. 1262–1268.
40. 40)
  - 57. Costa, Y.M.G., Oliveira, L.E.S., Silla, C.N.Jr.: ‘An evaluation of convolutional neural networks for music classification using spectrograms’, Appl. Soft Comput., 2017, 52, pp. 28–38.
41. 41)
  - 33. Nanni, L., Paci, M., Santos, F.L.C., et al: ‘Texture descriptors ensembles enable image-based classification of maturation of human stem cell-derived retinal pigmented epithelium’, PLoS One, 2016, 11, (2) p. e0149399.
42. 42)
  - 1. Russell, J.C., Hasler, N., Klette, R., et al: ‘Automatic track recognition of footprints for identifying cryptic species’, Ecology, 2009, 90, (7), pp. 2007–2013.
43. 43)
  - 34. Zhu, Z., You, X., Chen, C.L.P., et al: ‘An adaptive hybrid pattern for noise-robust texture analysis’, Pattern Recognit., 2015, 48, pp. 2592–2608.
44. 44)
  - 4. Costa, Y.M.G., Oliveira, L.E.S., Koerich, A.L., et al: ‘Music genre classification using LBP textural features’, Signal Process., 2012, 92, pp. 2723–2737.
45. 45)
  - 3. Haralick, R.M., Shanmugam, K., Dinstein, I.: ‘Textural features for image classification’, IEEE Trans. Syst. Man Cybern., 1973, 3, (6), pp. 610–621.
46. 46)
  - 32. Kannala, J., Rahtu, E.: ‘Bsif: binarized statistical image features’. 21st Int. Conf. on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2012, pp. 1363–1366.
47. 47)
  - 27. Ojansivu, V., Heikkila, J.: ‘Blur insensitive texture classification using local phase quantization’. Int. Conf. on Image and Signal Processing, 2008, pp. 236–243.
48. 48)
  - 24. Costa, Y.M.G., Oliveira, L.E.S., Koerich, A.L., et al: ‘Comparing textural features for music genre classification’. IEEE World Congress on Computational Intelligence, 2012, pp. 1867–1872.
49. 49)
  - 56. Gwardys, G., Grzywczak, D.: ‘Deep image features in music information retrieval’, Int. J. Electron. Telecommun., 2014, 60, (4), pp. 321–326.
50. 50)
  - 52. Ren, J.-M., Jang, J.-S.R.: ‘Discovering time-constrained sequential patterns for music genre classification’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (4), pp. 1134–1144.
51. 51)
  - 18. Cheng, J., Sun, Y., Ji, L.: ‘A call-independent and automatic acoustic system for the individual recognition of animals: a novel model using four passerines’, Pattern Recognit., 2010, 43, pp. 3846–3852.
52. 52)
  - 19. Lucio, D.R., Costa, Y.M.G.: ‘Bird species classification using visual and acoustic features extracted from audio signal’. Int. Conf. of the Chilean Computer Science Society, Valparaiso, Chile, 2016.
53. 53)
  - 48. Tzanetakis, G., Cook, P.: ‘Musical genre classification of audio signals’, IEEE Trans. Speech Audio Process., 2002, 10, (5), pp. 293–302.
54. 54)
  - 6. Nanni, L., Costa, Y.M.G., Lumini, A., et al: ‘Combining visual and acoustic features for music genre classification’, Expert Syst. Appl., 2016, 45, pp. 108–117.
55. 55)
  - 41. Vilches, E., Escobar, I.A., Vallejo, E.E., et al: ‘Data mining applied to acoustic bird species recognition’. Int. Conf. on Pattern Recognition, Hong Kong, 2006, pp. 400–403.
56. 56)
  - 51. Hamel, P.: ‘Pooled features classification’. Submission to Audio Train/Test Task of MIREX, 2011.
57. 57)
  - 25. Umesh, S., Cohen, L., Nelson, D.: ‘Fitting the mel scale’. Int. Conf. on Acoustics, Speech, and Signal Processing, 1999, pp. 217–220.
58. 58)
  - 44. Zhao, Z., Zhang, S.-H., Xu, Z.-Y., et al: ‘Automated bird acoustic event detection and robust species classification’, Ecological Inf., 2017, 39, pp. 99–108.
59. 59)
  - 14. Lee, C., Chou, C., Han, C., et al: ‘Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis’, Pattern Recognit. Lett., 2006, 27, pp. 93–101.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Bird and whale species identification using sound images

References

Related content