情报科学 ›› 2021, Vol. 39 ›› Issue (3): 136-142.

• • 上一篇    下一篇

基于Gaussian LDA与谱聚类融合的代表性负向评论提取

  

  • 出版日期:2021-03-01 发布日期:2021-03-15

  • Online:2021-03-01 Published:2021-03-15

摘要:

【目的/意义】在线评论尤其是负向评论是消费者进行购买决策的重要依据。而现有减少信息冗余方法在
负向在线评论中表现还有待提高。【方法/过程】文中提出了一种基于Gaussian LDA的负向评论谱聚类方法。首先,
利用Gaussian LDA模型获取负向评论中的主题分布,然后通过主题分布来计算评论间的皮尔森相似度,并应用谱
聚类算法实现负向评论聚类,最后提取每类距离簇中心最近的m条评论作为该类的代表性评论。【结果/结论】通过
将Gaussian LDA、LDA、TF-IDF和Doc2Vec分别与谱聚类结合,以及将Gaussian LDA与K-means、DBSCAN、谱聚
类结合进行交叉比较,验证了所提方法的优越性。据此提取的负向评论类别间区分度高,具有高度代表性,较好地
解决了信息冗余问题。【创新/局限】先提取主题再进行聚类的多模型集成式聚类方法为解决评论信息冗余问题提
供了新的方法和思路,也为研究文本挖掘、文本聚类提供了一种新的参考。

Abstract:

【Purpose/significance】Online reviews, especially negative reviews, are an important basis for consumers to make purchas⁃
ing decisions. However, the existing methods for reducing information redundancy still need to be improved in negative online reviews.
【Method/process】This paper proposes a clustering method of negative review spectrum based on Gaussian LDA. First, the Gaussian
LDA model is used to obtain the topic distribution in negative reviews, then the Pearson similarity between reviews is calculated based
on the topic distribution, and the negative clustering is implemented using a spectral clustering algorithm. Finally, each class is near⁃
est to the cluster center M comments are representative of this category.【Result/conclusion】Gaussian LDA, LDA, TF-IDF and
Doc2Vec were combined with spectral clustering respectively, and cross-comparison of Gaussian LDA with K-means, DBSCAN, and
spectral clustering was performed to verify the superiority of the method Sex. The negative comments extracted based on this are high⁃
ly differentiated and highly representative, which solves the problem of information redundancy.【Innovation/limitation】The
multi-model fusion clustering method that firstly extracting topics and then clustering provides a new method and ideas for solving the
problem of review information redundancy. It also provides a new reference for the study of text mining and text clustering.