情报科学 ›› 2022, Vol. 39 ›› Issue (2): 65-73.

• 理论研究 • 上一篇    下一篇

基于多源异构数据挖掘的在线评论知识图谱构建

  

  • 出版日期:2022-02-01 发布日期:2022-02-23

  • Online:2022-02-01 Published:2022-02-23

摘要: 【目的/意义】随着网络购物的普及,在线评论成为影响消费者、销售者和生产者决策的重要数据。大数据
时代,在线评论呈现出多源异构、爆发式增长的特点,难以为用户的购买决策和商家竞争提供有力的情报支撑。【方
法/过程】本文利用多源异构的在线评论数据构建知识图谱,提出了一种基于多源异构数据构建知识图谱的框架,
模式层构建围绕在线评论的信源、内容以及形式构建,最终形成知识图谱的概念框架,并运用word2vec从多源异构
文本中获取实体、关系和属性,并进行数据融合与知识图谱分析。【结果/结论】实验部分以手机商品在线评论为例,
验证了本文所构建的知识图谱对在线评论相关研究及挖掘的有效性,研究结果揭示了多源异构在线评论数据的特
点,为大数据环境下在线评论信息组织、展示和挖掘提供了新的研究视角。【创新/局限】运用知识图谱对在线评论
进行描述,有效解决信息过载、多源异构信息融合等问题。本文采用半自动化的方式构建知识图谱,未来考虑引入
无监督的方法提高构建效率。

Abstract: 【Purpose/significance】With the popularity of online shopping, online reviews have become important data that affect the de-
cision-making of consumers, sellers and producers. In the era of big data, online reviews show the characteristics of multi-source het-
erogeneous and explosive growth, and it is difficult to provide strong intelligence support for users' purchasing decisions and merchant
competition.【Method/process】Therefore, this article proposes a framework for constructing an online comment knowledge graph
based on multiple data sources. The model layer is constructed from top to bottom around the source, content, and form characteristics of online reviews to form the conceptual framework of the knowledge graph. Obtain entities, relationships and attributes from multi-source heterogeneous texts, and perform data fusion and knowledge graph analysis.【Result/conclusion】The experimental part takes the online reviews of mobile products as an example to verify the effectiveness of the knowledge graph constructed in this article for on-line review mining and analysis. The research results reveal the characteristics of multi-source heterogeneous online review data, and provide a new research perspective for the organization, display and mining of online review information in a big data environment.【In-novation/limitation】Use knowledge graphs to describe online reviews, effectively solving problems such as information overload and multi-source heterogeneous information fusion. This article uses a semi-automated method to construct the knowledge graph, and in the future, we will consider introducing an unsupervised method to improve the construction efficiency.