情报科学 ›› 2025, Vol. 43 ›› Issue (6): 14-27.

• 专题组稿 • 上一篇    下一篇

电子病历中基于实体识别和共现分析的疾病间 语义关系挖掘研究

  

  • 出版日期:2025-06-05 发布日期:2025-10-16

  • Online:2025-06-05 Published:2025-10-16

摘要: 【目的/意义】揭示电子病历中潜在疾病间语义关系,解决语义关系模糊问题。【方法/过程】本文构建了基 于实体识别和共现分析的疾病间关系挖掘模型,并以开放电子病历数据集为例进行实证研究。在实体识别上,本 文主要运用BERT -BiLSTM-CRF深度学习模型从电子病历中抽取疾病及相关信息,采用共现分析方法对疾病间 语义关系进行量化,最后使用相似度计算和层次聚类挖掘疾病间语义关系。【结果/结论】用于命名实体识别的深度 学习模型性能较好,在验证集上的 F1值达到 0.95,采用共现分析的方法能较好挖掘疾病间语义关系。【创新/局限】 本文融合直接共现与间接共现,提出一种基于综合共现的方法。

Abstract: 【Purpose/significance】To reveal the potential semantic relationship between diseases in Chinese electronic medical re⁃ cords and solve the problem of semantic relationship ambiguity.【Method/process】In this paper, we construct an inter-disease rela⁃ tionship mining model based on entity recognition and co-occurrence analysis, and conduct an empirical study on an open electronic medical record dataset. For entity recognition, this paper mainly uses the BERT-BiLSTM-CRF deep model to extract diseases and re⁃ lated information from Chinese electronic medical records, uses the co-occurrence analysis method to quantify the semantic relation⁃ ship between diseases, and finally uses similarity calculation and hierarchical clustering to mine the semantic relationship between dis⁃ eases.【Result/conclusion】The deep learning model for named entity recognition has good performance, and the F1 score on the vali⁃ dation set reaches 0.95. The method of co-occurrence analysis can better mine the semantic relationship between diseases.【Innova⁃ tion/limitation】This paper fuses direct co-occurrence and indirect co-occurrence, and proposes a method based on comprehensive co-occurrence.