情报科学 ›› 2022, Vol. 40 ›› Issue (11): 78-84.

• 业务研究 • 上一篇    下一篇

面向用户评论的主题挖掘研究——以美团为例 

  

  • 出版日期:2022-11-01 发布日期:2022-12-13

  • Online:2022-11-01 Published:2022-12-13

摘要: 【目的/意义】从海量自助餐用户评论数据中抽取有效关键词构建主题和主题词,协助商家了解用户口碑,
进而更好的改善餐饮行业的管理水平。【方法
/过程】通过融合TF-IDFTextRankLMKE三种不同的关键词抽取
方法获取最优关键词,再对抽取的关键词进行语义聚类、主题识别、主题词挖掘和主题权重计算,最后在采集的美
团数据集上进行验证方法的有效性。【结果
/结论】实验结果表明,三种关键词抽取方法的融合比单个关键词算法效
要好,文本评论聚类后的主题分别是:味道、菜品、环境、服务、价格,主题的重要程度依次是:味道
36.2%、服务
22.9%、价格15.1%、环境13.6%、菜品12.2%。实验结果证实,通过该方法能够有效识别和构建主题及主题词,并计算
出用户对于不同主题关注的重点内容,同时为餐饮行业主题及主题词挖掘和应用研究提供了一定的理论和技术基
础。【创新
/局限】提出一种半监督语义聚类的主题识别、主题词构建和主题权重评估方法;不足之处在于本次实验
仅以武汉地区的美食自助餐评论为主,其构建的主题适用性范围有限。

Abstract: Purpose/significanceEffective keywords are extracted from the massive buffet user comment data to build core themes and topic words, helping businesses to understand user word of mouth, so as to improve the management level of the catering industry and user accurate services.Method/processBy integrating three different keyword extraction methods, TF-IDF, TextRank and LMKE, the optimal keywords are obtained, and then the extracted keywords are subjected to semantic clustering, topic recognition topic word mining and topic weight calculation. Finally, the effectiveness of the method is verified on the collected Meituan dataset.Result/conclusionThe experimental results show that the fusion of the three keyword extraction methods is more effective than the single keyword algorithm. The topics after clustering of text comments are: taste, dishes, environment, service, and price. The impor⁃tance of the topics is: taste 36.2%, service 22.9%, price 15.1%, environment 13.6%, dishes 12.2%. The experimental results confirm that the method can effectively recognize and construct topics and identify topic words, calculate the key content that users pay atten⁃tion to different topics, and at the same time provide a certain theoretical and technical basis for the mining and application research of themes and topic words in the catering industry.Innovation/limitationA method for topic identification, topic word construction and topic weight evaluation for semi-supervised semantic clustering is proposed; the disadvantage is that this experiment only uses the food buffet in Wuhan , so it's built with a limited range of topic applicability.