情报科学

• • 上一篇    下一篇

面向网络虚假医疗信息的识别模型构建研究——一种基于预训练的BERT模型


  

  1. 1南京大学信息管理学院,江苏 南京 210023;2.江苏省数据工程与知识服务重点实验室,江苏 南京 210023

Research on the construction of recognition model for web-based false medical information - a pre-trained BERT-based model

摘要:

[目的/意义]解决获取虚假网络医疗信息数据集时专业知识不足的问题,帮助在小样本领域构建虚假网络医疗信息识别模型。[方法/过程]本文提出一种基于权威辟谣信息转化提取构建网络虚假医疗信息数据集的思路,并依次构建传统机器学习模型、CNN模型和BERT模型进行分类识别。[结果/结论]结果表明,基于辟谣信息能够实现以较低成本、不依赖专家标注构建虚假医疗信息数据集。通过对比实验发现,基于微博数据预训练的BERT模型准确率为95.91%,F1值为94.57%,相比于传统机器学习模型和CNN模型提升分别接近6%和4%,表明本文构建的基于预训练的BERT模型在网络虚假医疗信息识别任务上取得了更好的效果。

关键词:

虚假信息识别, 虚假医疗信息, BERT模型 , 深度学习, 在线医疗信息

Abstract:

[Purpose/significance]This research aims to solve the problem of insufficient professional knowledge when obtaining false online medical information data sets, and helps build false online medical information recognition models in the field of small samples[Method/process] we propose an idea of constructing an online fake medical information dataset based on the transformed extraction of authoritative misinformation refuting, and construct the traditional machine learning model, CNN model and BERT model for recognition[Result/conclusion] The results show that the construction of an online false medical information dataset based on misinformation refuting can be achieved at a lower cost and without relying on expert labeling. The comparative experiments based on the data set of online false medical information related to the COVID-19 show that the accuracy rate of the BERT model pre-trained based on Weibo data is 95.91%, and the F1 value is 94.57%, which is compared with traditional machine learning models and CNN model the increase is close to 6% and 4% respectively. It means that the pre-trained BERT-based model constructed in this paper achieves better results on the detection of false online medical information task.

Key words:

Misinformation detection, False medical information, BERT model; Deep learning, Online medical information