Multimodal framework based on audio-visual features for summarisation of cricket videos

Ali Javed; Aun Irtaza; Hafiz Malik; Muhammad Tariq Mahmood; Syed Adnan

Multimodal framework based on audio-visual features for summarisation of cricket videos

View Fulltext

Author(s): Ali Javed¹ ; Aun Irtaza² ; Hafiz Malik³ ; Muhammad Tariq Mahmood⁴ ; Syed Adnan²
- Affiliations: 1: Department of Software Engineering , University of Engineering and Technology , Taxila , Pakistan ;
  2: Department of Computer Science , University of Engineering and Technology , Taxila , Pakistan ;
  3: Department of Electrical and Computer Engineering , University of Michigan , Dearborn , USA ;
  4: School of Computer Science and Information Engineering , Korea University of Technology and Education , 1600 Chungjeolno, Byeogchunmyun, Cheonan , Republic of Korea
Source: Volume 13, Issue 4, 28 March 2019, p. 615 – 622
DOI: 10.1049/iet-ipr.2018.5589 , Print ISSN 1751-9659, Online ISSN 1751-9667

Received 29/07/2017, Accepted 02/01/2019, Revised 29/05/2018, Published 16/01/2019

Sports broadcasters generate an enormous amount of video content on the cyberspace due to massive viewership all over the world. Analysis and consumption of this huge repository urges the broadcasters to apply video summarisation to extract the exciting segments from the entire video to capture user's interest and reap the storage and transmission benefits. Therefore, in this study an automatic method for key-events detection and summarisation based on audio-visual features is presented for cricket videos. Acoustic local binary pattern features are used to capture excitement level in the audio stream, which is used to train a binary support vector machine (SVM) classifier. Trained SVM classifier is used to label audio frame as an excited or non-excited frame. Excited audio frames are used to select candidate key-video frames. A decision tree-based classifier is trained to detect key-events in the input cricket videos that are then used for video summarisation. Performance of the proposed framework has been evaluated on a diverse dataset of cricket videos belonging to different tournaments and broadcasters. Experimental results indicate that the proposed method achieves an average accuracy of 95.5%, which signifies its effectiveness.

References

1. 1)
  - 14. Rui, Y., Gupta, A., Acero, A.: ‘Automatically extracting highlights for TV baseball programs’. Proc. ACM Int. Conf. Multimedia, California, USA, October 2000, pp. 105–115.
2. 2)
  - 29. Tang, S., Zhi, M.: ‘Summary generation method based on audio feature’. Proc. IEEE Int. Conf. in Software Engineering and Service Science, Beijing, China, September 2015, pp. 619–623.
3. 3)
  - 34. Potapov, D., Douze, M., Harchaoui, Z., et al: ‘Category-specific video summarization’. Proc. Int. Conf. in European Conf. on Computer Vision, Zurich, Switzerland, September 2014, pp. 540–555.
4. 4)
  - 23. Hannane, R., Elboushaki, A., Afdel, K., et al: ‘An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram’, Int. J. Multimedia Inf. Retr., 2016, 5, (2), pp. 89–104.
5. 5)
  - 18. Merler, M., Joshi, D., Nguyen, Q.B., et al: ‘Automatic curation of golf highlights using multimodal excitement features’. Proc. Int. Conf. in Computer Vision and Pattern Recognition Workshops, Honolulu, Hawaii, July 2017, pp. 57–65.
6. 6)
  - 22. Homayounfar, N., Fidler, S., Urtasun, R.: ‘Sports field localization via deep structured models’. Proc. Int. Conf. in Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 2017, pp. 5212–5220.
7. 7)
  - 30. Kolekar, M.H., Sengupta, S.: ‘Semantic concept mining in cricket videos for automated highlight generation’, Multimedia Tools Appl., 2010, 47, (3), pp. 545–579.
8. 8)
  - 35. Kushol, R., Kabir, M.H., Salekin, M.S., et al: ‘Contrast enhancement by top-hat and bottom-hat transform with optimal structuring element: application to retinal vessel segmentation’. Proc. Int. Conf. Image Analysis and Recognition, Montreal,Canada, July 2017, pp. 533–540.
9. 9)
  - 13. Chen, C.M, Chen, L.H.: ‘Novel framework for sports video analysis: A basketball case study’. Proc. Int. Conf. in Image Processing, Paris, France, October 2014, pp. 961–965.
10. 10)
  - 19. Ma, Y.F., Hua, X.S., Lu, L., et al: ‘A generic framework of user attention model and its application in video summarization’, IEEE Trans. Multimedia, 2005, 7, (5), pp. 907–919.
11. 11)
  - 17. Tavassolipour, M., Karimian, M., Kasaei, S.: ‘Event detection and summarization in soccer videos using Bayesian network and copula’, IEEE Trans. Circuits Syst. Video Technol., 2014, 24, (2), pp. 291–304.
12. 12)
  - 32. Raventos, A., Quijada, R., Torres, L., et al: ‘Automatic summarization of soccer highlights using audio-visual descriptors’, Springer Plus, 2015, 4, (1), pp. 1–19.
13. 13)
  - 7. Tang, H., Kwatra, V., Sargin, M.E., et al: ‘Detecting highlights in sports videos: cricket as a test case’. Proc. Int. Conf. IEEE Multimedia and Expo, Barcelona, Spain, July 2011, pp. 1–6.
14. 14)
  - 27. Jiang, H., Lu, Y., Xue, J.: ‘Automatic soccer video event detection based on a deep neural network combined CNN and RNN’. Proc. Int. Conf. in Tools with Artificial Intelligence, San Jose, CA, USA, November 2016, pp. 490–494.
15. 15)
  - 31. Kolekar, M.H., Sengupta, S.: ‘A hierarchical framework for generic sports video classification’. Proc. Asian Conf. on Computer Vision, Berlin, Germany, January 2006, pp. 633–642.
16. 16)
  - 2. Cirne, M.V.M., Pedrini, H.: ‘VISCOM: A robust video summarization approach using color co-occurrence matrices’, Multimedia Tools Appl., 2018, 77, (1), pp. 857–875.
17. 17)
  - 37. Zeng, C., Dou, W.: ‘Audio keywords detection in basketball video’. Proc. Int. Conf. in Audio Language and Image Processing, Shangai, China, November 2010, pp. 1765–1770.
18. 18)
  - 28. Xu, M., Maddage, N.C., Xu, C., et al: ‘Creating audio keywords for event detection in soccer video’. Proc. Int. Conf. IEEE Multimedia and Expo, Baltimore, USA, July 2003, pp. 281–284.
19. 19)
  - 36. ‘Dataset Link’, Available at: http://www-personal.engin.umd.umich.edu/~hafiz/projs/avs.htm, accessed May 2018.
20. 20)
  - 33. Chatlani, N., Soraghan, J.J.: ‘Local binary patterns for 1-D signal processing’. Proc. European Conf. in Signal Processing, Aalborg, Denmark, August 2010, pp. 95–99.
21. 21)
  - 8. Bertasius, G., Park, H.S., Stella, X.Y., et al: ‘Am I a baller? Basketball performance assessment from first-person videos’. Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, October 2017, pp. 2196–2204.
22. 22)
  - 15. Javed, A., Bajwa, K.B., Malik, H., et al: ‘An efficient framework for automatic highlights generation from sports videos’, IEEE Signal Process. Lett., 2016, 23, (7), pp. 954–958.
23. 23)
  - 16. Hu, H.N., Lin, Y.C., Liu, M.Y., et al: ‘Deep 360 pilot: learning a deep agent for piloting through 360 sports video’. Proc. Int. Conf. in Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 2017, pp. 3–15.
24. 24)
  - 11. Chen, C.M., Chen, L.H.: ‘A novel method for slow motion replay detection in broadcast basketball video’, Multimedia Tools Appl., 2015, 74, (21), pp. 9573–9593.
25. 25)
  - 25. Wang, Z., Yu, J., He, Y.: ‘Soccer video event annotation by synchronization of attack–defense clips and match reports with coarse-grained time information’, IEEE Trans. Circuits Syst. Video Technol., 2017, 27, (5), pp. 1104–1117.
26. 26)
  - 21. Kolekar, M.H., Sengupta, S.: ‘Bayesian network-based customized highlight generation for broadcast soccer videos’, IEEE Trans. Broadcast., 2015, 61, (2), pp. 195–209.
27. 27)
  - 20. Kapela, R., McGuinness, K., O'Connor, N.E.: ‘Real-time field sports scene classification using colour and frequency space decompositions’, J. Real-Time Image Process., 2017, 13, (4), pp. 725–737.
28. 28)
  - 1. Meng, J., Wang, S., Wang, H., et al: ‘Video summarization via multiview representative selection’, IEEE Trans. Image Process., 2018, 27, (5), pp. 2134–2145.
29. 29)
  - 5. Yao, T., Mei, T., Rui, Y.: ‘Highlight detection With pairwise deep ranking for first-person video summarization’. Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Las Vegas, USA, June 2016, pp. 982–990.
30. 30)
  - 4. Boukadida, H., Berrani, S.A., Gros, P.: ‘A novel modeling for video summarization using constraint satisfaction programming’. Proc. Int. Sym. Visual Computing, Las Vegas, USA, December 2014, pp. 208–219.
31. 31)
  - 6. Midhu, K., Anantha Padmanabhan, N.K.: ‘Highlight Generation of Cricket Match Using Deep Learning’. In Hemanth, D., Smys, S. (eds): ‘Computational Vision and Bio Inspired Computing’. Lecture Notes in Computational Vision and Biomechanics, vol 28 (Springer, Cham, 2018), pp. 925–936.
32. 32)
  - 12. Nguyen, N., Yoshitaka, A.: ‘Shot type and replay detection for soccer video parsing’. Proc. Int. Symp. Multimedia, Irvine, USA, December 2012, pp. 344–347.
33. 33)
  - 24. Namuduri, K.: ‘Automatic extraction of highlights from a cricket video using MPEG-7 descriptors’. Proc. Int. Workshop on Communication Systems and Networks, Bangalore, India, January 2009, pp. 1–3.
34. 34)
  - 3. Zawbaa, H. M., El-Bendary, N., Hassanien, A. E., et al: ‘Machine learning-based soccer video summarization system’. Proc. Int. Conf. Multimedia, Computer Graphics and Broadcasting, Berlin, Heidelberg, 2011, pp. 19–28.
35. 35)
  - 10. Javed, A., Bajwa, K.B., Malik, H., et al: ‘A hybrid approach for summarization of cricket videos’. Proc. IEEE Int. Conf. Consumer Electronics-Asia, Seoul, South Korea, October 2016, pp. 1–4.
36. 36)
  - 26. Godi, M., Rota, P., Setti, F.: ‘Indirect match highlights detection with deep convolutional neural networks’. Proc. Int. Conf. on Image Analysis and Processing, Catania, Italy, September 2017, pp. 87–96.
37. 37)
  - 9. Baijal, A., Cho, J., Lee, W., et al: ‘Sports highlights generation based on acoustic events detection: a rugby case study’. Proc. IEEE Int. Conf. on Consumer Electronics, Las Vegas, USA, January 2015, pp. 20–23.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Multimodal framework based on audio-visual features for summarisation of cricket videos

References

Related content