情报科学 ›› 2023, Vol. 41 ›› Issue (2): 135-142.

• 业务研究 • 上一篇    下一篇

基于Stacking集成学习的在线健康社区问答信息采纳识别研究

  

  • 出版日期:2023-02-01 发布日期:2023-04-07

  • Online:2023-02-01 Published:2023-04-07

摘要: 【目的/意义】提出基于Stacking集成学习的问答信息采纳行为识别策略,促进在线健康社区问答的精准化
推送、助推数字化医疗服务高质量发展。【方法/过程】构建以集成学习方法和非集成学习方法为基学习器、以逻辑
回归算法(LR)为元学习器的Stacking集成学习模型,比较单预测模型、同类预测模型组合、不同类预测模型组合的
Stacking集成学习模型预测精度,选取“寻医问药”平台的慢性病问答构建数据集验证模型的优越性,并选取“快速
问医生有问必答120”平台数据验证模型的可移植性。【结果/结论】Stacking集成模型相比于单预测模型能够更精准
识别被采纳问答信息,模型具有较强的泛化性,可以适用于不同的在线健康社区。【创新/局限】本文基于Stacking集
成思想构建两阶段预测模型,并借助机器学习构建最佳预测模型组合,显著提高在线健康社区问答信息采纳识别
精度,但伴随问答信息积累,在线健康社区问答模式不断发展变化,考虑结合历史数据和每日更新数据的动态预测
方法是未来研究工作重点。

Abstract: 【Purpose/significance】In order to promote the accurate recommendation of online health community Q & A and boost the
high-quality development of digital medical services, this paper proposes the information adoption forecasting model according to the
stacking ensemble strategy which based on the massive online health community Q & A information.【Method/process】The stacking
ensemble strategy chooses non-integrated learning method and integrated learning method as the first layer learners, while linear re?
gression is used as the meta learner. We choose 'xywy.com' to build the dataset and construct predict indicators, including text struc?
ture, online social communication record, professional authority. We compare the prediction accuracy between single prediction model
and stacking ensemble strategy with different model combination. Then we select the data from '120ask.com' platform to verify the gen? eralization of stacking ensemble strategy.【Result/conclusion】The results demonstrate that the stacking ensemble strategy has higher prediction accuracy and strong generalization than the single prediction model, which can be applied to different online health Q & A communities.【Innovation/limitation】Based on machine learning methods, the stacking ensemble strategy can significantly improve the prediction accuracy of information adoption for online health Q & A communities. At the same time, the communication patterns are changing in online health Q & A communities, and it is important to take the daily updated data into account to improve the predict accuracy by stacking ensemble strategy in the future research.