情报科学 ›› 2021, Vol. 39 ›› Issue (3): 51-59.

• 理论研究 • 上一篇    下一篇

在线医疗社区问答文本的知识图谱构建研究

  

  • 出版日期:2021-03-01 发布日期:2021-03-15

  • Online:2021-03-01 Published:2021-03-15

摘要:

【目的/意义】针对医疗问答社区数据量大、规范性差、数据稀疏等特性,综合利用双向长短记忆神经网络
(BiLSTM)、条件随机场(CRF)、双向门控循环单元(BiGRU)等深度学习模型,对社区文本的实体识别及关系抽取
方法进行研究。【方法/过程】首先,对实体作了进一步细分,利用BiLSTM-CRF模型对BIO标注的数据集进行实体
识别,实验发现细分实体比未细分实体在结果上表现更好;接着利用BiGRU-Attention模型抽取各实体间的关系,
实验结果显示,该模型无论是在准确率、召回率还是F值上都比BiLSTM-Attention抽取模型有较大的提升;最后利
用Neo4j图数据库构建了一个可视化的知识图谱。【结果/结论】本研究将非结构化的社区文本转化为结构化数据,
在医疗社区的智能知识服务、知识表示、个性化知识推荐等方面具有推动作用。【创新/局限】在医疗实体识别过程
中将实体进行细分,成功构建了基于在线医疗社区问答文本的乳腺癌知识图谱。但由于某些关系样本量较少,对
整体关系抽取的评价指标存在一定的影响。

Abstract:

【Purpose/significance】This paper studies the Knowledge Graph construction method of the medical question and answer
community. Aiming at the large amount of data, poor standardization and sparse data of the question-and-answer community, this pa⁃
per comprehensively uses the bidirectional long-term memory neural network, conditional random field, bidirectional gated recurrent
unit and other models to study the Entity Recognition and Relation Extraction methods of community text.【Method/process】Firstly,
the entity is further subdivided. The bidirectional long-term memory neural network and the conditional random field model (BiL⁃
STM-CRF) are used to identify the data set of the BIO. The experiment finds that the segmented entity performs better than the
un-subdivided entity. Then the relationship between the entities is extracted by the bidirectional gated recurrent unit and the attention
mechanism model (BiGRU-Attention).【Result/conclusion】The experimental results show that the model has a greater improvement
than the BiLSTM-Attention extraction model in terms of accuracy, recall rate and F value. Finally, a visual Knowledge Graph was con⁃
structed using the Neo4j graph database. This research transforms unstructured community texts into structured data, which promotes
intelligent knowledge services, knowledge representation, and personalized knowledge recommendation in the medical community.
【Innovation/limitation】In the process of medical entity recognition, entities are subdivided, and a breast cancer Knowledge Graph
based on the text of online medical community question and answer is successfully constructed. However, due to the small sample size
of some relationships, there is a certain impact on the evaluation indicators of the overall relationship extraction.