access icon free Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis

Multi-modal sentiment analysis (MSA) is increasingly becoming a hotspot because it extends the conventional Sentiment analysis (SA) based on texts to multi-modal content which can provide richer affective information. However, compared with textbased sentiment analysis, multi-modal sentiment analysis has much more challenges, because the joint learning process on multi-modal data requires both fine-grained semantic matching and effective heterogeneous feature fusion. Existing approaches generally infer sentiment type from splicing features extracted from different modalities but neglect the strong semantic correlation among cooccurrence data of different modalities. To solve the challenges, a multi-level deep correlative network for multimodal sentiment analysis is proposed, which can reduce the semantic gap by analyzing simultaneously the middlelevel semantic features of images and the hierarchical deep correlations. First, the most relevant cross-modal feature representation is generated with Multi-modal Deep and discriminative correlation analysis (Multi-DDCA) while keeping those respective modal feature representations to be discriminative. Second, the high-level semantic outputs from multi-modal deep and discriminative correlation analysis are encoded into attention-correlation cross-modal feature representation through a co-attention-based multimodal correlation submodel, and then they are further merged by multi-layer neural network to train a sentiment classifier for predicting sentimental categories. Extensive experimental results on five datasets demonstrate the effectiveness of the designed approach, which outperforms several state-of-the-art fusion strategies for sentiment analysis.

Inspec keywords: feature extraction; learning (artificial intelligence); image representation; image fusion; pattern classification; text analysis

Other keywords: multimodal sentiment analysis; Multilevel Deep correlative networks; attention-correlation cross-modal feature representation; text-based sentiment analysis; respective modal feature representations; multimodal data; conventional Sentiment analysis; relevant cross-modal feature representation; multilevel deep correlative network; Multimodal Deep; multimodal content; discriminative correlation analysis; multilayer neural network

Subjects: Data handling techniques; Computer vision and image processing techniques; Optical, image and video signal processing; Document processing and analysis techniques; Knowledge engineering techniques; Image recognition

http://iet.metastore.ingenta.com/content/journals/10.1049/cje.2020.09.003
Loading

Related content

content/journals/10.1049/cje.2020.09.003
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading