情报科学 ›› 2025, Vol. 43 ›› Issue (6): 148-155.

• 业务研究 • 上一篇    下一篇

基于关键词扩展与Prompt-BERT-RCNN模型的医疗问答 社区短文本分类

  

  • 出版日期:2025-06-05 发布日期:2025-10-16

  • Online:2025-06-05 Published:2025-10-16

摘要: 【目的/意义】在医疗问答社区中实现短文本的自动分类对于提高其服务效率和改善用户体验至关重要。 通过构建一个结合关键词扩展技术和深度学习模型的短文本分类方法,以解决短文本分类中的特征稀疏和语义不 明确问题。【方法/过程】首先运用网络爬虫获取医疗问答社区“寻医问药网”的用户问题短文本;然后利用TF-IWF 加权关键词重要性,并通过FastText计算关键词相似度来扩展短文本特征;接着将提示学习与深度学习模型融合, 构建Prompt-BERT-RCNN 模型,实现医疗短文本的有效分类。【结果/结论】实证研究表明,关键词扩展后的分类 效果显著高于扩展前,且 Prompt-BERT-RCNN 模型对扩展后的医疗短文本的分类准确率高达 97.92%,并在 9个 不同医疗类别中均表现优异。【创新/局限】TF-IWF与FastText的短文本扩展方法弥补了Word2vec未考虑关键词稀 有度和子词上下文信息方面的缺陷,Prompt-BERT-RCNN 模型通过融合Prompt的引导、BERT的深层语义理解 以及 RCNN 的区域感知和特征提取能力进一步提升了短文本的分类准确率;但模型在个别主题的准确率仍有待 提升。

Abstract: 【Purpose/significance】Achieving automatic classification of short texts in medical Q&A communities is crucial for improv⁃ ing service efficiency and enhancing user experience. This paper proposes a short text classification method that combines keyword ex⁃ pansion techniques and deep learning models to address the problems of feature sparsity and semantic ambiguity in short text classifi⁃ cation.【Method/process】First, web crawlers are used to collect short user question texts from the medical Q&A community "www. xywy.com". Then, TF-IWF is applied to weight keyword importance, and FastText is used to calculate keyword similarity to expand the short text features. Next, prompt learning is integrated with deep learning models to construct a Prompt-BERT-RCNN model for ef⁃ fective classification of medical short texts.【Result/conclusion】Empirical research shows that the classification performance signifi⁃ cantly improves after keyword expansion, with the Prompt-BERT-RCNN model achieving a classification accuracy of 97.92% on the expanded medical short texts and performing excellently across nine different medical categories.【Innovation/limitation】The TFIWF and FastText-based short text expansion method solves the shortcomings of Word2vec, which does not account for keyword rarity and subword contextual information. The Prompt-BERT-RCNN model further improves classification accuracy by combining the guid⁃ ance of prompts, BERT's deep semantic understanding, and RCNN's region-aware feature extraction capabilities. However, the model's accuracy for some topics still needs improvement.