情报科学 ›› 2024, Vol. 42 ›› Issue (10): 191-201.

• 业务研究 • 上一篇    下一篇

融合静态与动态语义表示的在线问诊文本命名实体识别研究

  

  • 出版日期:2024-10-01 发布日期:2025-03-27

  • Online:2024-10-01 Published:2025-03-27

摘要: 【 目的/意义】在线健康社区含有大量、繁杂的问诊文本,采用多类文本语义表示方法充分获取文本的语义 信息,以精准识别其中的命名实体。【方法/过程】首先,分别使用Word2vec和GloVe静态词嵌入模型对问诊文本进 行训练获得局部与全局静态语义表示,进而融合为静态语义表示。其次,利用医疗健康领域的ERNIE-Health动态 词嵌入模型生成动态语义表示。最后,将静态和动态语义表示融合后输入BiLSTM-CRF,识别出问诊文本中的实 体。【结果/结论】相较于单一的静态和动态语义表示,语义表示融合后的问诊文本命名实体识别效果在准确率、召 回率和F1值分别提升了2.17%、4.07%和3.12%以及0.60%、3.18%和1.89%。【创新/局限】综合考虑问诊文本中的静态 与动态语义表示,提升问诊文本命名实体识别效果。但仍存在含义相同但名称不同的实体,一定程度上影响实体 质量。

Abstract: 【 Purpose/significance】 The online health community contains a large amount of complex consultation texts, which require the comprehensive use of multiple types of text semantic representation methods to obtain the texts' semantic information, in order to accurately identify the named entities of the texts.【 Method/process】 Firstly, Word2vec and GloVe static word embedding models are used to train the texts to obtain local and global static semantic representations respectively, which are further fused as a static seman⁃ tic representation. Secondly, apply the ERNIE-Health dynamic word embedding model in the health field to generate dynamic seman⁃ tic representations. Finally, the static and dynamic semantic representations are fused as input of BiLSTM-CRF to fully learn the se⁃ mantic dependencies of the texts' context and the dependency relationships between entity labels, to identify entities of the texts.【 Re⁃ sult/conclusion】 Compared to the single static semantic representation and dynamic semantic representation, the named entity recogni⁃ tion of the fused semantic representation improves accuracy, recall, and F1 values by 2.17%、4.07% and 3.12%, as well as 0.60%、 3.18% and 1.89%, respectively.【 Innovation/limitation】 Comprehensively considering the static and dynamic semantic representation in the consultation texts can improve the named entity recognition of the texts. However, there are still some entities with the same meaning but different names, which affects the quality of the entities in certain degree.