情报科学 ›› 2024, Vol. 42 ›› Issue (9): 178-191.

• 博士论坛 • 上一篇    下一篇

基于时间序列聚类和天际线算法的 高价值热点主题挖掘方法研究

  

  • 出版日期:2024-09-01 发布日期:2024-11-06

  • Online:2024-09-01 Published:2024-11-06

摘要: 【目的/意义】针对现有研究中关键词筛选的指标维度较少,且文献主题的演化和排序方法存在单一性问 题,本文提出一种基于时间序列聚类和天际线算法的高价值热点主题挖掘方法。【方法/过程】首先,通过RFM模型 对关键词进行价值分层,获取具有高价值层次的关键词。基于构建的语义关系网,利用社区发现算法获取初始文 献研究主题。接着,对初始主题簇进行二次近邻传播聚类,以揭示不同主题间存在的具有相似发展特征的演化规 律;同时,提取主题相对重要性的表征指标,借助天际线算法和主成分分析法实现主题的科学排序。最后,围绕“城 乡社区供需服务”主题,检索1998—2022年相关的知网文献,采用本方法开展文本挖掘工作。【结果/结论】本文提出 的新方法综合考虑了关键词的时间维度和价值属性,给出了一种综合主题识别、演化和排序的较为系统的主题挖 掘方法。通过实验结果的对比分析发现,本方法能够有效地识别高价值热点主题,多维度全面地评估主题热度。

Abstract: 【Purpose/significance】In response to the existing research on literature topic mining, which has fewer index dimensions for keyword screening and the problem of uniqueness in the topic evolution and ranking methods of literature, this paper proposes a highvalue hot topic mining method based on time series clustering and skyline algorithm.【Method/process】Firstly, the paper stratifies the value of keywords through RFM model to obtain keywords with high value levels. Next, the initial thematic clusters are re-clustered to reveal the evolutionary phenomenon of similar developmental characteristics existing among different themes. At the same time, we ex⁃ tract the characterization indexes of the relative importance of topics, and realize the scientific ranking of topics with the help of sky⁃ line algorithm and principal component analysis. Finally, the paper processed and mined the data of journal literature related to "ur⁃ ban and rural community supply and demand services" from 1998 to 2022 on China Knowledge Network.【Result/Conclusion】The new method proposed in this paper integrates the temporal dimension and value attributes of keywords, and gives a more systematic topic mining method that integrates topic identification, evolution and ranking. The comparative analysis of the results shows that this method can accurately and quickly identify high-value hot topics and comprehensively evaluate the topic hotness in multiple dimensions【. Innovation/limitation】This research's calculation methods for measuring keyword value and topic popularity mainly focus on the optimization of evaluation criteria and measurement indicators, and do not cover all possible ele⁃ ments that might affect these indicators.