情报科学 ›› 2025, Vol. 43 ›› Issue (1): 98-105.

• 业务研究 • 上一篇    下一篇

基于BERTopic和LSTM模型的新兴主题预测研究

  

  • 出版日期:2025-01-05 发布日期:2025-06-27

  • Online:2025-01-05 Published:2025-06-27

摘要: 【目的/意义】相比新兴主题的回溯性探测,对新兴主题进行预测研究可以提高新兴主题识别的准确性和前 瞻性,有助于丰富新兴主题探测分析的方法体系。【方法/过程】首先,利用BERTopic模型得到领域系列主题;其次, 基于文档频率、引用频率、Pscore和新兴分数构建模型预测特征集;然后基于主题前三年特征集数据采用 LSTM 模 型预测后两年新兴分数,判断得到领域新兴主题。【结果/结论】构建基于 BERTopic和 LSTM 模型的新兴主题预测 方法,并以数据安全领域为例进行实证研究,通过和 BP、SVM 模型以及相关研究结果的比较表明该方法得到的新 兴主题更加有效和合理。【创新/局限】融合新颖性、增长性和影响性特征,构建单一指标新兴分数来预测新兴主题, 但没有考虑对未来可能出现主题的预测。

Abstract: 【Purpose/significance】Compared with the retrospective detection of emerging themes, the prediction research on emerging themes can improve the accuracy and foresight of emerging theme identification, and help to enrich the method system of emerging theme detection and analysis.【Method/process】Firstly, the domain series topic is obtained by using the BERTopic model. Secondly, the model prediction feature set is constructed based on document frequency, citation frequency, Pscore and emerging scores. Then, based on the feature set data of the first three years of the topic, LSTM model is used to predict the emerging scores of the last two years, and the emerging topics in the field are judged. 【Result/conclusion】The emerging topic prediction method based on BERTopic and LSTM models was constructed, and an empirical study was conducted in the field of data security. The results showed that the emerging topics obtained by this method were more effective and reasonable through comparison with BP and SVM models and verifi⁃ cation of relevant data.【Innovation/limitation】Combining the characteristics of novelty, growth and impact, a single index emerging score is constructed to predict emerging themes, providing reference for future emerging theme prediction research, but without consid⁃ ering the prediction of possible future themes.