情报科学 ›› 2023, Vol. 41 ›› Issue (2): 107-117.

• 业务研究 • 上一篇    下一篇

基于引用内容聚类的文献被引主题识别及其演化分析

  

  • 出版日期:2023-02-01 发布日期:2023-04-07

  • Online:2023-02-01 Published:2023-04-07

摘要: 【目的/意义】引用动机不同会导致一篇论文在多次被引用时的引用主题和重点产生差异,识别这些被引主
题并分析其变化,有助于引用动机分析,提高文献推荐效果。【方法/过程】本文首先抽取被引文献的上下文语境信
息,根据文本长度界定多种引用内容的划分范围;然后结合多种文本聚类方法,识别被引主题并比较其异同;最后
通过时序比较,分析被引主题的演化路径和过程。【结果/结论】选取人工智能研究领域中的代表性高被引论文进行
分析,发现前后句是对当前引用句的重要补充,引用句及其前后句组合能够更好地揭示被引主题;基于引用内容的
被引主题呈现出多样化的特征,揭示了原文内容的扩展和引用动机的差异;被引主题演化分析能够有效地揭示原
文内容被应用或改进的方向、主题、方法和技术。【创新/局限】形成基于引用内容聚类的文献被引主题识别及其演
化分析框架,证明被引主题的差异化以及对原文的补充作用,同时揭示引用内容的主题时序变化的特征与现实意
义,后续有必要扩大研究样本,使得研究结果具有更好的通用性。

Abstract: 【Purpose/significance】When a paper is cited many times, the cited topics and key points are different due to different cita?
tion motives. Identifying these cited topics and analyzing their changes will help analyze the citation motives and improve the paper
recommendation effect.【Method/process】Firstly, we extract the context information of the cited reference and define various scope of cited content based on text length. Then we identify cited topics and compare their similarities and differences through various text
clustering methods. Finally, we analyze the evolution path and process of cited topics through time series comparison.【Result/conclu?
sion】After analyzing the representative highly-cited papers in the field of artificial intelligence, it is found that the preceding and fol?
lowing sentences are important supplements to the current cited sentences, and the cited sentences and their combination of preceding and following sentences can better represent the cited topics. The cited topics based on the citation content are diverse, revealing the differences between the expansion of the original content and the motivation of citation. Evolutionary analysis of cited topics can effec? tively reveal the direction, theme, methods and techniques of applying or improving the original content.【Innovation/limitation】This paper forms a framework for the identification and evolution analysis of cited topics of papers based on citation content clustering to prove the differentiation of cited topics and their supplement to the original text, and at the same time, to reveal the characteristics and practical significance of the temporal changes of cited topics. Future research need expand the research samples and make the re? search results more universal.