access icon free Extracting statistically significant behaviour from fish tracking data with and without large dataset cleaning

Extracting a statistically significant result from video of natural phenomenon can be difficult for two reasons: (i) there can be considerable natural variation in the observed behaviour and (ii) computer vision algorithms applied to natural phenomena may not perform correctly on a significant number of samples. This study presents one approach to clean a large noisy visual tracking dataset to allow extracting statistically sound results from the image data. In particular, analyses of 3.6 million underwater trajectories of a fish with the water temperature at the time of acquisition are presented. Although there are many false detections and incorrect trajectory assignments, by a combination of data binning and robust estimation methods, reliable evidence for an increase in fish speed as water temperature increases are demonstrated. Then, a method for data cleaning which removes outliers arising from false detections and incorrect trajectory assignments using a deep learning-based clustering algorithm is proposed. The corresponding results show a rise in fish speed as temperature goes up. Several statistical tests applied to both cleaned and not-cleaned data confirm that both results are statistically significant and show an increasing trend. However, the latter approach also generates a cleaner dataset suitable for other analysis.

Inspec keywords: computer vision; pattern classification; image denoising; data handling; learning (artificial intelligence); estimation theory; aquaculture

Other keywords: fish tracking data; computer vision; image data; statistical extraction; dataset cleaning; natural phenomena; data binning

Subjects: Other topics in statistics; Agriculture, forestry and fisheries computing; Optical, image and video signal processing; Computer vision and image processing techniques; Information technology applications; Statistics; Other topics in statistics; Agriculture; Data handling techniques; Knowledge engineering techniques

References

    1. 1)
      • 38. Kollios, G., Gunopulos, D., Koudas, N., et al: ‘Efficient biased sampling for approximate clustering and outlier detection in large data sets’, IEEE Trans. Knowl. Data Eng., 2003, 15, (5), pp. 11701187.
    2. 2)
      • 40. Breunig, M.M., Kriegel, H.-P., Ng, R.T., et al: ‘Lof: identifying density-based local outliers’. Proc. ACM SIGMOID Int. Conf. Management of Data, 2000, pp. 93104.
    3. 3)
      • 18. Kaisler, S., Armour, F., Espinosa, J.A., et al: ‘Big data: issues and challenges moving forward’. Proc. IEEE Hawaii Int. Conf. System Sciences, 2012, pp. 9951004.
    4. 4)
      • 24. Xiao, J.: ‘A 2D + 3D rich data approach to scene understanding’. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2013.
    5. 5)
      • 30. Tang, N.: ‘Big data cleaning’, web technologies and applications, lecture notes in computer science (Springer International Publishing, 2014), 8709, pp. 1324.
    6. 6)
      • 43. Boom, B.J., He, J., Palazzo, S., et al: ‘Research tool for the analysis of underwater camera surveillance footage’, Ecol. Inform., 2013, 23, pp. 8397.
    7. 7)
      • 15. Hellerstein, J.M.: ‘Quantitative data cleaning for large databases’, United Nations Economic Commission for Europe, 2008, pp. 142.
    8. 8)
      • 45. Naftel, A., Khalid, S.: ‘Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space’, Multimedia Syst., 2006, 12, pp. 227238.
    9. 9)
      • 27. Huang, P., Boom, B., Fisher, R.: ‘Underwater live fish recognition using a balance-guaranteed optimized tree’. Proc. Asian Conf. Computer Vision, 2012, pp. 422433.
    10. 10)
      • 39. Knorr, E.M., Ng, R.T.: ‘Finding intensional knowledge of distance-based outliers’. Proc. Int. Conf. Very Large Data Bases, 1999, pp. 211222.
    11. 11)
      • 4. Zhou, J., Bai, X., Caelli, T. (Eds.): ‘Computer vision and pattern recognition in environmental informatics’ (IGI-Global, 2015).
    12. 12)
      • 34. Boom, B.J., Beauxis-Aussalet, E., Hardman, L., et al: ‘Uncertainty-aware estimation of population abundance using machine learning’ (Multimedia Systems, 2015), p. 113.
    13. 13)
      • 13. Kumar, S., Singh, S.K.: ‘Visual animal biometrics: survey’, IET Biomet., 2017, 6, (3), pp. 139156.
    14. 14)
      • 20. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., et al: ‘Deep learning applications and challenges in big data analytics’, J. Big Data, 2015, 2, (1), pp. 121.
    15. 15)
      • 25. Kumar, P.: ‘High performance object detection on big video data using GPUs’. Proc. Int. Conf. Multimedia Big Data, 2015, pp. 383388.
    16. 16)
      • 1. MAED 2014: The 3rd ACM Int. regular and data challenge workshop on multimedia analysis for ecological data’. Available at http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=37434, accessed November 2016.
    17. 17)
      • 6. Pugh, M.: ‘Removing false detections from a large fish image dataset’. Master thesis, School of Informatics, University of Edinburgh, 2015.
    18. 18)
      • 49. Sillito, R.R., Fisher, R.B.: ‘Semi-supervised learning for anomalous trajectory detection’. Proc. British Machine Vision Conf., 2008, pp. 227238.
    19. 19)
      • 35. Krishnan, S., Wang, J., Wu, E., et al: ‘ActiveClean: interactive data cleaning for statistical modeling proceeding’. VLDB Endowment, 2016, pp. 112.
    20. 20)
      • 32. Fan, W., Geerts, F., Neven, F.: ‘Making queries tractable on big data with preprocessing: through the eyes of complexity theory’, Proc. VLDB Endowment, 2013, 6, (9), pp. 685696.
    21. 21)
      • 28. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. Neural Information and Processing Systems, 2012, vol. 25, pp. 11061114.
    22. 22)
      • 16. Chen, M., Mao, S., Liu, Y.: ‘Big data: a survey’, Mobile Netw. Appl., 2014, 19, (2), pp. 171209.
    23. 23)
      • 5. Fisher, R.B., Chen-Burger, Y.-H., Giordano, D., et al (Eds.): ‘Fish4Knowledge: collecting and analyzing massive coral reef fish video data’ (Springer, 2016).
    24. 24)
      • 31. Fan, W., Geerts, F., Cao, Y., et al: ‘Querying big data by accessing small data’. Proc. Association for Computing Machinery Symp. Principles of Database Systems, 2015, pp. 173184.
    25. 25)
      • 50. Li, C., Han, Z., Ye, Q., et al: ‘Abnormal behavior detection via sparse reconstruction analysis of trajectory’. Proc. Int. Conf. Image and Graphics, 2011, pp. 807810.
    26. 26)
      • 10. Palazzo, S., Murabito, F.: ‘Fish species identification in real-Life underwater images’. Proc. 3rd ACM Int. Workshop on Multimedia Analysis for Ecological Data, 2014.
    27. 27)
      • 12. Ohayona, S., Avni, O., Taylor, A.L., et al: ‘Automated multi-day tracking of marked mice for the analysis of social behaviour’, J. Neurosci. Methods, 2013, 219, pp. 1019.
    28. 28)
      • 26. Spampinato, C., Palazzo, S., Giordano, D., et al: ‘Covariance-based fish tracking in real-life underwater environment’. Proc. Int. Conf. Computer Vision Theory and Applications, 2012, pp. 409414.
    29. 29)
      • 9. Johansen, J.L., Messmer, V., Coker, D.J., et al: ‘Increasing ocean temperatures reduce activity patterns of a large commercially important coral reef fish’, Glob. Change Biol., 2014, 20, pp. 10671074.
    30. 30)
      • 42. Kaur, P., Kaur, K.: ‘A review on outlier detection for data cleaning in data mining’, Int. J. Innov. Res. Comput. Commun. Eng., 2016, 4, (7), pp. 1437314376.
    31. 31)
      • 7. Beyan, C., Boom, B.J., Liefhebber, J.M.P., et al: ‘Natural swimming speed of Dascyllus reticulatus increases with water temperature’, ICES Mar. Sci., 2015, 72, (8), pp. 25062511.
    32. 32)
      • 29. Shang, L., Yang, L., Wang, F., et al: ‘Real-time large scale near-duplicate web video retrieval’. Proc. ACM Int. Conf. Multimedia, 2010, pp. 531540.
    33. 33)
      • 23. Alexander, J.: ‘Scene understanding for real time processing of queries over big data streaming video’. PhD thesis, Department of Electrical Engineering and Computer Science, The University of Central Florida, 2013.
    34. 34)
      • 2. ‘Visual observation and analysis of vertebrate and insect behavior 2014’. Available at http://homepages.inf.ed.ac.uk/rbf/vaib14.html, accessed November 2016.
    35. 35)
      • 17. Manyika, J., Chui, M., Brown, B., et al: ‘Big data: the next frontier for innovation, competition, and productivity’ (McKinsey Global Institute, 2011).
    36. 36)
      • 3. ‘ImageCLEF/LifeCLEF – multimedia retrieval in CLEF’. Available at http://www.imageclef.org/node/181, accessed November 2016.
    37. 37)
      • 52. Bengio, Y., Courville, A., Vincent, P.: ‘Representation learning: a review and new perspectives’, IEEE Trans. Pattern Anal. Machine Intell., 2013, 35, pp. 17981828.
    38. 38)
      • 44. Morris, B.T., Trivedi, M.M.: ‘A survey of vision-based trajectory learning and analysis for surveillance’, IEEE Trans. Circuits Syst. Video Technol., 2008, 18, (8), pp. 11141127.
    39. 39)
      • 22. Kavasidis, I., Palazzo, S., Salvo, R., et al: ‘An innovative web-based collaborative platform for video annotation’, Multimedia Tools Appl., 2013, 7, (2), pp. 120.
    40. 40)
      • 21. Huang, T.: ‘Surveillance video: the biggest big data. Computing now’, IEEE Comput. Soc., 2014, 7, (2).
    41. 41)
      • 51. Chen, G.: ‘Deep learning with nonparametric clustering’ (arXiv preprint arXiv:1501.03084, 2015), pp. 114.
    42. 42)
      • 53. Ranzato, M., Hinton, G.E.: ‘Modeling pixel means and covariance using factorized third-order Boltzmann machines’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010, pp. 25512558.
    43. 43)
      • 8. Johansen, J.L., Jones, G.P.: ‘Increasing ocean temperature reduces the metabolic performance and swimming ability of coral reef damselfishes’, Glob. Change Biol., 2011, 17, pp. 29712979.
    44. 44)
      • 47. Bashir, F., Wu, Q., Khokhar, A., et al: ‘HMM-based motion recognition system using segmented PCA’. Proc. IEEE Int. Conf. Image Processing, 2005, pp. 22862289.
    45. 45)
      • 14. Spampinato, C., Palazzo, S.: ‘Hidden Markov models for detecting anomalous fish trajectories in underwater footage’. Proc. Int. Workshop on Machine Learning for Signal Processing, 2012.
    46. 46)
      • 11. Stern, U., He, R., Yang, C.-H.: ‘Analyzing animal behavior via classifying each video frame using convolutional neural networks’, Sci. Rep., 2015, 5, (14351), pp. 113.
    47. 47)
      • 54. Katsageorgiou, V.M., Huang, H., Ferretti, V., et al: ‘Unsupervised mouse behavior analysis: a data-driven study of mice interactions’. Proc. Int. Conf. Pattern Recognition, 2016.
    48. 48)
      • 48. Porikli, F.: ‘Learning object trajectory patterns by spectral clustering’. Proc. IEEE Conf. Multimedia Expo, 2004, pp. 11711174.
    49. 49)
      • 41. Loureiro, A., Torgo, L., Soares, C.: ‘Outlier detection using clustering methods: a data cleaning application’. Proc. KDNet Symp. Knowledge-Based Systems for the Public Sector, 2004.
    50. 50)
      • 36. Krishnan, S., Franklin, J.M., Goldberg, K., et al: ‘ActiveClean: an interactive data cleaning framework for modern machine learning’, Proc. SIGMOD, 2016, pp. 21172120.
    51. 51)
      • 46. Sillito, R.R., Fisher, R.B.: ‘Parametric trajectory representations for behaviour classification’. Proc. British Machine Vision Conf., 2009, pp. 111.
    52. 52)
      • 37. Beyan, C., Fisher, R.B.: ‘Detection of abnormal fish trajectories using a clustering based hierarchical classifier’. Proc. British Machine Vision Conf., 2013, pp. 111.
    53. 53)
      • 33. Fan, W., Geerts, F., Libkin, L.: ‘On scale independence for querying Big data’. Proc. Association of Computing Machinery Symp. Principles of Database Systems, 2014, vol. 6, (9), pp. 5162.
    54. 54)
      • 19. Gani, A., Siddiqa, A., Shamshirband, S., et al: ‘A survey on indexing techniques for big data: taxonomy and performance evaluation’, Knowl. Inf. Syst., 2016, 46, pp. 241284.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2016.0462
Loading

Related content

content/journals/10.1049/iet-cvi.2016.0462
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading