情报科学 ›› 2022, Vol. 40 ›› Issue (5): 90-96.

• 业务研究 • 上一篇    下一篇

基于决策树和逻辑回归模型的人工智能领域潜在“精品”论文识别研究 

  

  • 出版日期:2022-05-01 发布日期:2022-05-30

  • Online:2022-05-01 Published:2022-05-30

摘要: 【目的/意义】海量科技文献中存在大量潜在“精品”文献,如何识别并利用此类文献是目前较具现实意义的
研究问题。【方法
/过程】本文以Web of Science数据库中人工智能领域1990-2010年期间的文献原文及引文数据为
样本,构建该领域文献原文
-引文特征向量空间,融合决策树和逻辑回归模型对文献特征向量空间进行模型训练和
潜在“精品”论文识别的测试应用。【结果
/结论】实验结果表明,“发表五年后被引量”特征变量的加入能够显著提升
决策树和逻辑回归模型的识别分类效果,使得两类模型的识别准确率分别达到
84%89%以上,提升幅度达到 20
多个百分点。逻辑回归模型的识别效果始终优于决策树模型,通过调整两种模型的超参数,能够使得模型获得更
理想的识别效果。此外,早期人工智能领域科学研究仍处于小团队协作阶段,领域文献的基金支持和开放获取程
度较低。【创新
/局限】尽管论文创新性引入机器学习方法实现潜在“精品”文献识别模型的建模与应用,然而仍需将
模型拓展到更多学科领域。

Abstract: Purpose/significanceThere are a large number of excellent papers in the scientific literature that have not been found.Iden⁃tifying and making use of these excellent papers have important practical significance at present. Method/processIn this study,we use the 1990-2010 original and citation literature data in the field of artificial intelligence from the Web of Science database to construct the original paper-citation feature vector space,and use the decision tree and logistic regression for model training and testing. Result/conclusionThe result shows that the indicator of " citations during five years after publication" can significantly improve the recogni⁃tion effect of decision trees and logistic regression,making the accuracy of the two models reach 84% and 89% respectively,and the in⁃
crease rate reached more than 20%.The recognition effect of the logistic regression is always better than that of the decision tree.By ad⁃justing the hyperparameters of the two models,the model can obtain a better recognition effect.In addition,early scientific research in the field of artificial intelligence is still in the stage of small team collaboration,and the degree of funding and open access to this field literature is low
. Innovation/limitationWe innovatively introduce machine learning methods to realize the recognition models of hid⁃den treasuresamong massive literature.However,we need apply these recognition models into more disciplines.