情报科学 ›› 2021, Vol. 39 ›› Issue (2): 129-136.

• 业务研究 • 上一篇    下一篇

基于NRL和k-means的舆情事件聚类研究

  

  • 出版日期:2021-02-01 发布日期:2021-03-11

  • Online:2021-02-01 Published:2021-03-11

摘要:

【目的/意义】聚类网络舆情事件,不仅使得舆情信息更有层次和条理,还能辅助舆情事件个性化推荐等后
续研究。【方法/过程】融合网络表示学习与K-means,经过舆情事件收集、事件共现频率分析、事件降维映射、聚类分
析四个阶段达到舆情事件聚类的目的。收集舆情事件后根据事件间共现关系构造事件共现矩阵,运用NRL相关算
法获得舆情事件的低维向量表示;然后运用K-means进行聚类:首先确定分组数量、划分初始簇;根据该类别中事
件低维向量表示的均值更新类别中心;迭代至聚类完成。【结果/结论】运用蚁坊舆情监测软件已分类的220起舆情
事件进行实证,发现融入NRL的K-means聚类能够达到较好的聚类效果。【创新/局限】以挖掘舆情事件为基础,创
新提出融合网络表示学习的k-means聚类方法,获得条理清晰的舆情事件。然而个人研究可获取的数据数量有限,
难以达成最优聚类效果,互联网信息平台拥有海量用户数据,可以达成更好的聚类效果以便个性化推荐等后续研
究。

Abstract:

【Purpose/significance】Clustering network public opinion events not only makes the public opinion information more hierarchi⁃
cal and organized, but also assists in follow-up research such as personalized public opinion event recommendation.【Method/process】Com⁃
bining Network Representation Learning (NRL) with K-means, the purpose of public opinion events clustering is achieved through four stag⁃
es: public opinion collection, event co-occurrence frequency analysis, event dimensionality reduction mapping and cluster analysis. After ob⁃
taining the public opinion event, the event co-occurrence matrix is obtained according to the co-occurrence relationship between events, and
the NRL correlation algorithm is used to obtain the low-dimensional vector representation of the public opinion event. Then K-means is
used to determine the number of groups, the initial cluster is divided, updating the category center based on the mean represented by the
low-dimensional vector of events in the category; and finally, the iteration to clustering is completed.【Result/conclusion】Using the 220 lyr⁃
ic events classified by the ant square public opinion monitoring software, it is found that K-means clustering integrated into NRL can
achieve better clustering effect.【Innovation/limitation】Based on the mining of public opinion events, we innovatively propose a k-means
clustering method that integrates network representation learning to obtain clear public opinion events. However, the amount of data avail⁃
able for personal research is limited, and it is difficult to achieve the optimal clustering effect. Internet information platforms have massive us⁃
er data, which can achieve better clustering results for personalized recommendations and other follow-up research.