情报科学 ›› 2022, Vol. 40 ›› Issue (4): 156-165.

• 博士论坛 • 上一篇    下一篇

基于细粒度语义实体的学术论文推荐研究 

  

  • 出版日期:2022-04-01 发布日期:2022-05-15

  • Online:2022-04-01 Published:2022-05-15

摘要: 【目的/意义】为帮助科研用户快速准确地找到与自身研究兴趣相关的学术论文,构建了基于细粒度语义实
体的学术论文推荐模型。【方法
/过程】将实验前期识别出的研究主题、研究对象和理论技术类语义实体作为学术论
文和核心作者的内容特征,分别利用
TF-IDF算法、TextRank算法和LDA模型得到学术论文和核心作者的特征词,
利用
Word2vec对特征词进行向量化,再计算核心作者和学术论文的余弦相似度,将余弦相似度值靠前的Top20
荐给作者。【结果
/结论】利用准确率、召回率和F值对基于三种算法得到的特征词生成的推荐结果进行比较评价,结
果表明,基于
TF-IDF算法得到的特征词生成的推荐效果最佳,并对推荐结果进行了实例展示,可以看出本文提出
的推荐模型能够更为全面地为科研用户推荐与其研究兴趣类似的学术论文,提高科研效率。【创新
/局限】本文主要
是从学术论文的内容特征入手,对类型细分后的关键词利用不同算法进行核心作者特征词筛选,进而实现学术论
文推荐,但是对学术论文中包含的网络关系并未涉及。

Abstract: Purpose/significanceIn order to help scientific research users find academic papers related to their research interests quickly and accurately, an academic paper recommendation model based on fine-grained semantic entities is constructed.Method/processThe research topics, research objects, and theoretical and technical semantic entities identified in the early stage of the ex⁃periment are used as the content features of academic papers and core authors, and academic papers and core authors are obtained by using TF-IDF algorithm, TextRank algorithm and LDA model, respectively Use Word2vec to vectorize the feature words, and then cal⁃culate the cosine similarity between the core author and the academic paper, and recommend the top 20 cosine similarity values to the author. Result/conclusionUsing accuracy, recall and F-values to compare and evaluate the recommendation results generated by the feature words based on the three algorithms, the results show that the feature words generated based on the TF-IDF algorithm have the best recommendation effect.The results of the recommendation are shown with examples, and it can be seen that the recommendation model proposed in this paper can more comprehensively recommend academic papers with similar research interests to scientific re⁃search users and improve the efficiency of scientific research. Innovation/limitationIt mainly starts with the content characteristics of academic papers, and does not involve the network relationships contained in academic papers.