情报科学 ›› 2021, Vol. 39 ›› Issue (5): 156-162.

• 业务研究 • 上一篇    下一篇

文本主题视域下的高校论文研究前沿领域及演化发展趋势研究

  

  • 出版日期:2021-05-01 发布日期:2021-05-12

  • Online:2021-05-01 Published:2021-05-12

摘要:

【目的/意义】利用文本挖掘方法分析高校论文前沿主题和发展趋势,为高校科研评价破除“四唯”提供新的
视角,为高校科研优势方向选择提供模型参考与理论支持。【方法/过程】首先,获取高校高水平论文题录信息,运用
PLDA模型进行研究主题识别,提出一种利用主题热度、主题新颖度等指标表征高校研究前沿主题的识别模型。然
后,利用余弦相似度模型计算不同主题间的研究相似度,进行各高校研究主题演化分析,揭示高校研究主题发展变
化趋势。通过与世界研究前沿内容契合度进行对比分析,识别出高校研究主题的前沿领域。【结果/结论】实验结果
表明,文本提出方法能够有效揭示出各高校研究前沿主题以及随时间变化的主题演化规律以及与世界研究前沿的
契合度。【创新/局限】提出一种文本主题视域下的高校论文研究前沿领域及演化发展趋势研究方法,主要局限是数
据集覆盖范围还不够全面。

Abstract:

【Purpose/significance】This paper used text mining method to analyze the frontier topics and development trends of Univer⁃
sity papers, so as to provide a new perspective for university scientific research evaluation as well as a model reference and theoretical
support for university scientific research advantage direction selection.【Method/process】Firstly, the title information of high-level pa⁃
pers in Colleges and universities is obtained, the PLDA model is used to identify the research topics, and a recognition model is pro⁃
posed to represent the research frontier topics in colleges and universities by using the indexes such as topic heat and topic novelty.
Then, the cosine similarity model is used to calculate the research similarity between different topics, and the evolution of research top⁃
ics in colleges and universities is analyzed to reveal the development trend of research topics in colleges and universities. Through the
comparative analysis of the fit degree with the world research frontier content, the frontier areas of university research topics are identi⁃fied.【Result/conclusion】The experimental results show that the proposed method can effectively reveal the research frontier topics of universities, the evolution law of topics over time, and the fit degree with the world research frontier.【Innovation/limitation】This pa⁃per proposes a research method for the research frontier and evolution trend of university papers from the perspective of text topics.The main limitation is that the coverage of data sets is not comprehensive enough.