情报科学 ›› 2024, Vol. 42 ›› Issue (6): 89-98.

• 业务研究 • 上一篇    下一篇

基于BERT-BiLSTM混合模型的社交媒体虚假信息识别研究

  

  • 出版日期:2023-06-01 发布日期:2024-07-31

  • Online:2023-06-01 Published:2024-07-31

摘要: 【 目的/意义】探索信息疫情背景下社交媒体中真伪信息的主题特征,研究社交媒体平台评论信息特征及真 伪识别问题,为用户和社交媒体平台信息识别提供参考依据。【方法/过程】针对社交媒体平台上疫情相关的多主题 数据,以 Twitter 平台推文为数据集。运用 LDA 模型,提取真实信息和虚假信息的主要表述和语义特征。引入 BERT 预处理方式,融合双向长短时记忆网络算法,构建 BERT-BiLSTM 混合模型,识别虚假疫情信息。【结果/结 论】基于 LDA主题模型的对比研究,发现真实和虚假信息在主题和表述特征上存在显著差异。通过与传统机器学 习算法进行比较,BERT-BiLSTM模型对虚假疫情信息识别具有显著优势,准确率达到0.960,F1值为0.961。因此, 本文构建的BERT-BiLSTM模型将为虚假信息识别提供更精准、高效的解决方案。【创新/局限】以社交媒体平台疫 情信息为研究对象,综合运用LDA主题模型探究了疫情信息的特征,在小规模数据集上以较低成本实现了多主题 数据的有效识别,为信息疫情治理提供了高效的解决方案。

Abstract: 【Purpose/significance】 This research aims to explore the thematic features of real and false information, study the problem of identifying the authenticity of comment information, and provide reference basis for information recognition on social media platform under the background of public health events.【 Method/process】 For epidemic related multi topic data on social media platforms, LDA models are used to extract thematic features of real and false information. By introducing a BERT preprocessing method, we construct a BERT-BiLSTM hybrid model to identify false epidemic information.【 Result/conclusion】 We found that there are significant differ⁃ ences between real and false information in theme features and expression methods, providing opinions and references for identifying false information. In addition, compared with traditional machine learning algorithms, BERT-BiLSTM model has significant advan⁃ tages in identifying epidemic misinformation, with an accuracy rate of 0.960 and an F1 value of 0.961. The BERT-BiLSTM model will provide a more efficient and accurate solution for misinformation recognition.【 Innovation/limitation】 Taking epidemic information on social media platforms as the research object, the LDA model was comprehensively used to explore the main characteristics of real and false epidemic information. Effective identification of multi topic data was achieved at a lower cost on small-scale datasets, providing an efficient solution for infodemic management.