情报科学 ›› 2024, Vol. 42 ›› Issue (2): 35-42.

• 理论研究 • 上一篇    下一篇

基于信息可信度评估的突发公共卫生事件谣言识别研究

  

  • 出版日期:2024-02-05 发布日期:2024-06-07

  • Online:2024-02-05 Published:2024-06-07

摘要:

【目的/意义】随着突发公共卫生事件不断演变,人们对其认识有一个从模糊到精确的过程。笔者对突发公
共卫生事件网络信息可信度进行量化分级,为更细致化的谣言识别提供数据支持。【方法/过程】分析并选取信息文
本关键词、情感、评论、信源和媒体五大静态特征,融合时间和当日新增确诊数两大动态特征,结合熵值法将其量化
得到谣言指数RI,基于此引入“宽容区间”,并借助朴素贝叶斯分类器确定界限,将谣言识别结果的可信度分为低、
中、高三类。【结果/结论】该模型在训练集和验证集上表现良好,正确率分别为95%和90.20%,与决策树和SVM两个
基线模型相比,模型各项性能指标均有显著提升。【创新/局限】本研究建立了一种基于信息可信度评估的谣言识别
模型,通过建立RI指数,并创新纳入有关疫情状况的动态指标,提高谣言识别的精准度;对谣言进行可信度分级,突
破传统的谣言识别二分类检测方法的局限性。

Abstract:

【Purpose/significance】With the continuous evolution of public health emergencies, people's understanding of them has a
process from vagueness to accuracy. The author quantified and graded the credibility of online information of public health emergen⁃
cies to provide data support for more detailed rumor identification.【Method/process】Five static features of the information text,
namely keywords, emotion, comment, information source and media, and two dynamic features, namely time and the number of newly
confirmed cases on the same day, were selected, the rumor index RI was quantified with the entropy method. Based on this, the "toler⁃
ance interval" was introduced, and the boundary was determined by the naive Bayes classifier. The reliability of rumor identification
results was divided into three categories: low, medium and high.【Result/conclusion】The model performed well in the training set and
the verification set, with the accuracy of 95% and 90.20%, respectively. Compared with the decision tree and SVM baseline models,
the model's performance indexes were significantly improved.【Innovation/limitation】In this study, a rumor identification model based
on information credibility evaluation was established. RI index was established and dynamic indicators related to the epidemic situa⁃
tion were innovatively incorporated to improve the accuracy of rumor identification. The credibility classification of rumor is intro⁃
duced to break through the limitation of the traditional binary classification detection method of rumor identification.