情报科学 ›› 2024, Vol. 42 ›› Issue (2): 97-108.

• 业务研究 • 上一篇    下一篇

面向特定病症的中医医案语料库构建
——以睡眠障碍病症为例

  

  • 出版日期:2024-02-05 发布日期:2024-06-07

  • Online:2024-02-05 Published:2024-06-07

摘要:

【目的/意义】针对医案不规范、不统一用语影响文本挖掘的效率和准确性的问题,以中医睡眠障碍医案为
例进行术语规范化研究,提出构建中医医案语料库的方法,为机器理解医案提供规范的数据基础,进而提高中医知
识挖掘的效率,推动中医隐性知识显性化。【方法/过程】收集大量睡眠障碍医案,参考国家标准从遣词用语的角度
控制内容质量,选取科学、权威的医案作为研究基础;提取医案核心词语语料,对核心语料进行词形、词义和词间关
系控制,确定并统计每一个语义下的首选术语和同义表述;最后,提出以中医诊疗逻辑为基础的医案语料库结构,
将语料融入中医知识体系,构建了睡眠障碍医案语料库。【结果/结论】提出了中医专病医案的术语规范化的原则与
流程,构建了中医睡眠障碍诊疗医案规范术语的对应语料库,为中医医案知识挖掘提供辅助,为新时代中医智慧化
贡献力量。【创新/局限】提出在已有中医基础术语研究成果基础上,深入细分领域,对特定病症进行术语规范化和
语料库构建的方法;本研究筛选出的医案数量有一定局限,望能够在未来研究中进一步丰富语料库内容。

Abstract:

【Purpose/significance】To address the problem of non-standardized and non-uniform terminology in medical cases affect⁃
ing the efficiency and accuracy of text mining, we conducted a study on terminology standardization of traditional Chinese medicine
(TCM) sleep disorder cases as an example, and proposed a method to build a corpus of TCM medical cases to provide a standardized
data base for machine understanding of medical cases, thereby improving the efficiency of TCM knowledge mining and promoting the
manifestation of TCM tacit knowledge.
【Method/process】A large number of medical cases of sleep disorders were collected, the na⁃
tional standards were referred to control the quality of content from the perspective of word formation and phrasing, and scientific and
authoritative medical cases were selected as the basis of the study; the core word corpus of medical cases was extracted, and the core
corpus was subjected to the control of word form, word meaning, and inter-word relationship, and the preferred terminology and syn⁃
onymous expressions under each semantics were identified and statistically counted; finally, the structure of the medical case corpus
based on the logic of Chinese medicine diagnosis and treatment was proposed to integrate the corpus into Chinese medicine knowledge
system and construct the corpus of sleep disorders medical cases. into the knowledge system of Chinese medicine, and constructed a
corpus of medical cases of sleep disorders.【Result/conclusion】The principles and processes of terminology standardization for TCM
specific medical cases are proposed, and a corresponding corpus of standardized terms for TCM sleep disorder diagnosis and treatment
medical cases is constructed to provide assistance for knowledge mining of TCM medical cases and contribute to the wisdom of TCM in
the new era.
Innovation/limitation】 On the basis of the existing research results on the basic terminology of Chinese medicine, the
method of terminology standardisation and corpus construction for specific diseases is proposed by in-depth segmentation of the field;
the number of medical cases screened in this study has some limitations, and it is hoped that the content of the corpus can be further
enriched in the future research.