情报科学 ›› 2021, Vol. 39 ›› Issue (6): 152-160.

• 博士论坛 • 上一篇    下一篇

基于在线问诊文本信息的医生推荐研究

  

  • 出版日期:2021-06-01 发布日期:2021-06-25

  • Online:2021-06-01 Published:2021-06-25

摘要: 【目的/意义】为了向在线医疗社区中的用户自动推荐符合其自身实际需求的医生,本文基于在线问诊文本 信息,提出了基于相似用户与相似医生的混合医生推荐算法。【方法/过程】首先从用户咨询问题出发,找到具有相 似咨询问题的用户,将其所选择的医生作为基于相似用户的推荐集合;然后从医生回答从发,通过LDA主题模型训 练,从医生回答文本集中挖掘出隐含的疾病主题,按主题查找具有相似疾病诊治经验的医生作为推荐集合;最后通 过混合相似度计算融合基于相似用户和相似医生的推荐结果,得到最终推荐列表。【结果/结论】通过对在线医疗社 区“39健康网”进行实证研究,结果表明,利用本文提出的方法进行推荐,能够有效降低数据维度,挖掘文本间的潜 在语义关联,有效缩小语义鸿沟,提升推荐质量,具有较好的推荐效果。【创新/局限】本文仅选取了针对科室的小样 本数据进行实验,且部分参数使用经验值,未来可深入探讨该方法在大规模医疗数据集上的应用。

Abstract: 【Purpose/significance】In order to automatically recommend doctors to users according to their actual needs in the online medical community, this paper proposes a hybrid doctor recommendation algorithm based on similar users and similar doctors, accord⁃ ing to online consultation text information.【Method/process】Firstly, starting from the user counseling questions, find users which has similar counseling problems, and the doctors they selected are taken as the recommendation set based on similar users. Then from the doctor's answers, through LDA topic model training, dig out the implied disease theme from the doctor's answer text set, and find the doctor with similar disease diagnosis and treatment experience as a recommendation set according to the theme; Finally, the final rec⁃ ommendation list is obtained by combining the recommendation results of similar users and similar doctors through the calculation of mixed similarity.【Result/conclusion】The empirical results show that the method proposed in this paper can effectively reduce the da⁃ ta dimension, dig the potential semantic correlation between texts, effectively narrow the semantic gap, improve the recommendation quality, and have a good recommendation effect.【Innovation/limitation】This paper only selected a small sample data from the depart⁃ ments to conduct experiments, some parameters used empirical values. In the future, the application of this method to large-scale med⁃ ical data sets should be further explored.