情报科学 ›› 2024, Vol. 42 ›› Issue (10): 181-190.

• 业务研究 • 上一篇    下一篇

基于深度学习的社交媒体用户健康焦虑识别研究

  

  • 出版日期:2024-10-01 发布日期:2025-03-27

  • Online:2024-10-01 Published:2025-03-27

摘要: 【 目的/意义】当前,通过对社交媒体用户数据的分析来对其心理与行为进行识别已成为学术界的研究前 沿。健康焦虑已成为公众的主要心理问题之一。本文旨在通过文本自动分类识别出社交媒体环境下具有健康焦 虑倾向的用户。【方法/过程】以微博作为数据来源,采集以“健康焦虑”为关键词的相关数据,通过数据清洗、标注、 文本向量化表示和分类模型构建来识别具有健康焦虑倾向的用户。【结果/结论】 RoBERTa-wwm模型的识别效果 优于 Bert-base-Chinese 等模型,将 RoBERTa-wwm 预训练语言模型生成的词向量与 COMET 模型生成的心理状 态特征向量进行拼接融合,能够更好地对文本语义进行特征表示;基于门控循环单元、缩放点积注意力机制和全连 接层构建的健康焦虑识别模型表现最好。【创新/局限】本研究构建了社交媒体环境下用户健康焦虑识别模型,对模 型的识别效果进行了评价。研究结果可为用户健康焦虑预测与识别系统构建、网络健康群体的心理安全监测提供 有益参考。研究局限为选取的社交媒体平台数据来源较为单一。

Abstract: 【Purpose/significance】 Analyzing social media user data to detect their psychology and behavior has become a research frontier in the academic community. Health anxiety became one of the main psychological problems of the public. This paper aims to identify users with health anxiety tendencies in social media environments through automatic text classification.【 Method/process】 It used China's social media platform "Weibo" as the data source and identify users with health anxiety through data collecting, clean⁃ ing, tagging, text vectorization representation, and classification model construction.【 Result/conclusion】 This study found that the per⁃ formance of the RoBERTa-wwm model is superior to Bert-base-Chinese and other models. Combining the word vectors generated by the RoBERTa-wwm pre-trained language model with the psychological state feature vectors generated by the COMET model can bet⁃ ter represent the semantic features of text. The health anxiety detection model based on gated loop units, scaled dot product attention mechanism, and fully connected layer performs best. 【Innovation/limitation】 This study constructed various health anxiety detection models and evaluated their performances. It provided valuable references for constructing user health anxiety detection and early warn⁃ ing systems, as well as for the psychological safety monitoring of online health groups. Its limitation lays in the single data source.