情报科学 ›› 2023, Vol. 41 ›› Issue (1): 166-173.

• 博士论坛 • 上一篇    下一篇

基于细粒度评论挖掘的在线图书相似度计算研究

  

  • 出版日期:2023-01-01 发布日期:2023-04-06

  • Online:2023-01-01 Published:2023-04-06

摘要: 【目的/意义】通过深度学习方法对图书评论进行细粒度挖掘,并基于挖掘结果优化图书间相似度计算结
果。【方法/过程】首先从在线书评网站上采集图书评论,对评论进行词性分析构建属性词表,随后基于属性词表对
评论进行类型标注,通过 BERT-BiLSTM 模型对标注数据进行学习以实现评论自动分类,最后通过 BERT对分类
后的评论进行向量表示,通过余弦相似度计算评论间的相似度以表征图书相似度。【结果/结论】本文构造的
BERT-BiLSTM 评论分类模型准确率、召回率和 F1 值分别达到 0.922、0.921 和 0.921,可以实现较好的评论分类结
果。通过模型将评论划分为文笔、人物、情节、概要、读者态度5种类型来计算图书间相似度可以得到较为契合的相
似度结果。【创新/局限】相较于其他类型的评论,通过人物与情节类评论计算图书相似度的效果有待提高。以后可
对这两类评论进行更为细粒度的分析。

Abstract: 【Purpose/significance】Through deep learning method, fine-grained book reviews are mined, and the similarity between
books is optimized based on the mining results.【Method/process】Firstly, book reviews are collected from online book review websites, and attribute lexicon is constructed by pos analysis. Then, reviews are typed based on attribute lexicon, and annotation data is learned by BERT-BiLSTM model to realize automatic classification of reviews. Finally, vector representation of classified reviews is carried out by BERT. The similarity between reviews is calculated by cosine similarity to represent book similarity【. Result/conclusion】The ac? curacy, recall and F1 values of BERT-BiLSTM review classification model constructed in this paper reach 0.922, 0.921 and 0.921 re? spectively, which can achieve good review classification results.The model divides reviews into five types: writing style, character iden? tity, plot, summary and reader's attitude, so as to calculate the similarity between books and get a more consistent result【. Innovation/limitation】Compared to other types of reviews, the effectiveness of calculating book similarity by character identity and plot type of re? views needs to be improved.A more fine-grained analysis of these two types of reviews can be performed later.