Joint optimisation convex-negative matrix factorisation for multi-modal image collection summarisation based on images and tags

Joint optimisation convex-negative matrix factorisation for multi-modal image collection summarisation based on images and tags

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Image collection summarisation aims to represent a large-scale multi-modal collection with a small subset of images and tags, helping navigate a large image dataset. Most extant methods leverage the contributions of text-to-visual summaries, ignoring the visual contribution to the textual topic. When the tags are weakly labelled, the textual topic cannot accurately reflect the visual summary. To solve this, the authors propose a novel model, joint optimisation of convex non-negative matrix factorisation, which incorporates images and tags in a beneficial way. The objective function contains visual and textual error functions, sharing the same indicator matrix, connecting different modal relations. Then, they propose an iterative algorithm to optimise the proposed model. Finally, they explore the effects of different visual feature representations (e.g. bag-of-words and deep learning) on multi-modal collection summary. Our proposed method is then compared with state-of-the-art algorithms using two multi-modal datasets (i.e. MIRFlickr and NUS-WIDE-SCENE). Experimental results demonstrate the effectiveness of their proposed approach.


    1. 1)
      • 1. Sadeghi, F., Tena, J.R., Farhadi, A., et al: ‘Learning to select and order vacation photographs’. Applications of Computer Vision IEEE, Waikoloa, HI, USA, January 2015, pp. 510517.
    2. 2)
      • 2. Wang, J., Jia, L., Hua, X.-S.: ‘Interactive browsing via diversified visual summarization for image search results’, Multimedia Syst., 2011, 17, (5), pp. 379391.
    3. 3)
      • 3. Lowe, D.G.: ‘Distinctive image features from scale invariant keypoints’, Int. J. Comput. Vis., 2004, 60, (2), pp. 91110.
    4. 4)
      • 4. Csurka, G., Dance, C.R., Fan, L., et al: ‘Visual categorization with bags of keypoints’, Workshop Stat. Learn. Comput. Vis. ECCV, 2004, 44, (247), pp. 122.
    5. 5)
      • 5. Hadi, Y., Essannouni, F., Thami, R.O.H.: ‘Video summarization by K-medoid clustering’. Proc. ACM Symp. Applied Computing, Dijon, France, April 2006, pp. 14001401.
    6. 6)
      • 6. Clough, P., Joho, H., Sanderson, M.: ‘Automatically organising images using concept hierarchies’. Proc. Multimedia Workshop running at ACM SIGIR Conf., January 2005, pp. 3339.
    7. 7)
      • 7. Schmitz, P.: ‘Inducing ontology from Flickr tags’. Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland, May 2006.
    8. 8)
      • 8. Jaffe, A., Naaman, M., Tassa, T., et al: ‘Generating summaries and visualization for large collections of geo-referenced photographs’. Proc. Eighth ACM Int. Workshop on Multimedia Information Retrieval, Santa Barbara, CA, USA, October 2006, pp. 8998.
    9. 9)
      • 9. Li, C.H., Chiu, C.Y., Huang, C.R., et al: ‘Image content clustering and summarization for photo collections’. IEEE Int. Conf. Multimedia and Expo, Toronto, ON, Canada, July 2006, pp. 10331036.
    10. 10)
      • 10. Simon, I., Snavely, N., Seitz, S.M.: ‘Scene summarization for online image collections’. IEEE 11th Int. Conf. Computer Vision, Rio de Janeiro, Brazil, October 2007, pp. 18.
    11. 11)
      • 11. Yang, C.L., Shen, J.L., Peng, J.Y., et al: ‘Image collection summarization via dictionary learning for sparse representation’, Pattern Recognit., 2013, 46, (3), pp. 948961.
    12. 12)
      • 12. Wang, Q., Wan, J., Yuan, Y.: ‘Locality constraint distance metric learning for traffic congestion detection’, Pattern Recognit., 2018, 75, pp. 272281.
    13. 13)
      • 13. Fang, H., Lu, W., Wu, F., et al: ‘Topic aspect-oriented summarization via group selection’, Neurocomputing, 2015, 149, pp. 16131619.
    14. 14)
      • 14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, December 2012, pp. 10971105.
    15. 15)
      • 15. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015.
    16. 16)
      • 16. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’, arXiv:1512.03385 [cs], 2015.
    17. 17)
      • 17. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, 2014.
    18. 18)
      • 18. Wang, Q., Gao, J., Yuan, Y.: ‘Embedding structured contour and location prior in siamesed fully convolutional networks for road detection’, IEEE Trans. Intell. Transp. Syst., 2018, 19, (1), pp. 230241.
    19. 19)
      • 19. Xu, H., Wang, J., Hua, X.-S., et al: ‘Hybrid image summarization’. Proc. 19th ACM Int. Conf. Multimedia, Scottsdale, AZ, USA, December 2011, pp. 12171220.
    20. 20)
      • 20. Li, M.X., Zhao, C.X., Tang, J.H.: ‘Hybrid image summarization by hypergraph partition’, Neurocomputing, 2013, 119, pp. 4148.
    21. 21)
      • 21. Zhou, B., Jagadeesh, V., Piramuthu, R.:ConceptLearner: discovering visual concepts from weakly labeled image collections’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, June 2015, pp. 14921500.
    22. 22)
      • 22. Camargo, J.E., GonzÃlez, F.A.: ‘Multimodal latent topic analysis for image collection summarization’, Inf. Sci., 2016, 328, pp. 270287.
    23. 23)
      • 23. Ding, C.H.Q., Li, T., Jordan, M.I.: ‘Convex and semi-nonnegative matrix factorizations’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (1), pp. 4555.
    24. 24)
      • 24. Huiskes, M.J., Lew, M.S.: ‘The MirFlickr retrieval evaluation’. Proc. First ACM Int. Conf. Multimedia Information Retrieval, Vancouver, British Columbia, Canada, October 2008, pp. 3943.
    25. 25)
      • 25. Chua, T.-S., Tang, J., Hong, R., et al: ‘NUS-WIDE: a real-world web image database from National University of Singapore’. Proc. ACM Int. Conf. Image and Video Retrieval, Santorini, Fira, Greece, July 2009, pp. 4857.
    26. 26)
      • 26. Zhu, X., Goldberg, A.B., Van Gael, J., et al: ‘Improving diversity in ranking using absorbing random walks’, Phys. Lab., Univ. Wash., 2007, pp. 97104.
    27. 27)
      • 27. Jia, Y., Shelhamer, E., Donahue, J., et alCaffe: convolutional architecture for fast feature embedding’. Proc. ACM Int. Conf. Multimedia, Orlando, FL, USA, November 2014, pp. 675678.
    28. 28)
      • 28. Lin, C.-J.: ‘Projected gradient methods for nonnegative matrix factorization’, Neural Comput., 2007, 19, (10), pp. 27562779.
    29. 29)
      • 29. Salton, G., Wong, A., Yang, C.S.: ‘A vector space model for automatic indexing’, Commun. ACM, 1975, 18, (11), pp. 613620.
    30. 30)
      • 30. Yu, H., Deng, Z.-H., Yang, Y., et al: ‘A joint optimization model for image summarization based on image content and tags’. 28th AAAI Conf. Artificial Intelligence, Québec City, Québec, Canada, July 2014, pp. 215221.
    31. 31)
      • 31. Yang, C.L., Peng, J.Y., Fan, J.P.: ‘Image collection summarization via dictionary learning for sparse representation’. IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012, pp. 11221129.

Related content

This is a required field
Please enter a valid email address