情报科学 ›› 2022, Vol. 40 ›› Issue (3): 117-125.

• 业务研究 • 上一篇    下一篇

基于DR-Transformer模型的多模态情感识别研究

  

  • 出版日期:2022-03-01 发布日期:2022-03-08

  • Online:2022-03-01 Published:2022-03-08

摘要: 【目的/意义】本文融合文本和图像的多模态信息进行情感识别,引入图片模态信息进行情感语义增强,旨在
解决单一文本模态信息无法准确判定情感极性的问题。【方法/过程】本文以网民在新浪微博发表的微博数据为实
验对象,提出了一种基于DR-Transformer模型的多模态情感识别算法,使用预训练的DenseNet和RoBERTa模型,
分别提取图片模态和文本模态的情感特征;通过引入Modal Embedding机制,达到标识不同模态特征来源的目的;
采用浅层Transformer Encoder对不同模态的情感特征进行融合,利用Self-Attention机制动态调整各模态信息特征
的权重。【结果/结论】在微博数据集上的实验表明:模型情感识别准确率为 79.84%;相较于基于单一文本、图片模
态的情感分类算法,本模型准确率分别提升了 4.74%、19.05%;相较于对不同模态特征向量进行直接拼接的特征融
合方法,本模型准确率提升了 1.12%。充分说明了本模型在情感识别的问题上具有科学性、合理性、有效性。【创
新/局限】利用 Modal Embedding 和 Self-Attention 机制能够有效的融合多模态信息。微博网络舆情数据集还需进
一步扩充。

Abstract: 【Purpose/significance】In this paper,multi-modal information of text and image is integrated for emotion recognition,aiming
at solving the problem that single modal information cannot accurately predict the polarity of emotion【. Method/process】Based on the microblog data published by Internet users on Sina Weibo,this paper proposes a multi-modal emotion recognition algorithm based on DR-Transformer model.The pre-trained DenseNet and RoBERTa models are used to extract the emotional features of image mode and text mode respectively.The Modal Embedding mechanism was introduced to identify the sources of different modal features.A shallow Transformer Encoder is used to integrate the emotional features of different modes,and the self-attention mechanism is used to dynami‐cally adjust the weight of information features of each mode【. Result/conclusion】Experiments on the Weibo dataset show that:The emo‐tion recognition accuracy of this model is 79.84%; Compared with the sentiment classification algorithm based on single text and im‐age modes,the accuracy of this model is improved by 4.74% and 19.05% respectively; Compared with the feature fusion method that di‐rectly concatenate the feature vectors of different modes,the accuracy of this model is improved by 1.12%.It is proved that this model is scientific,reasonable and effective in emotion recognition【. Innovation/limitation】Multimodal information can be effectively integrated by using Modal Embedding and Self-Attention mechanism.The dataset of microblog network public opinion needs to be further ex‐panded.