情报科学 ›› 2022, Vol. 40 ›› Issue (4): 71-78.

• 业务研究 • 上一篇    下一篇

基于BERT-BiLSTM-CRF模型的算法术语抽取与创新演化路径构建研究 

  

  • 出版日期:2022-04-01 发布日期:2022-05-15

  • Online:2022-04-01 Published:2022-05-15

摘要: 【目的/意义】从海量论文元数据中抽取算法术语并构建它们之间的创新演化关系,有利于对算法的有效管
理和运用,以帮助科研工作者提升研究效率、采纳前沿成果。【方法
/过程】首先,以GAN算法论文摘要为语料,通过
人工标注与规则抽取相结合的方式进行算法术语标注,并利用
BERT-BiLSTM-CRF模型实现算法术语的自动抽
取。然后,将建立的模型应用于
LDA算法论文的被引文献元数据中抽取算法术语,依据规则判断和引文关系,从被
引内容中抽取
LDA算法的创新演化路径并构建。【结果/结论】以GAN论文为实例的算法术语实验中,精确率、召回
率与
F1分数分别达到了0.810.630.71,并应用关系抽取方法成功构建了LDA算法的创新演化路径,该方法可以
有效推动算法进化网络构建和算法检索与追踪等方面的工作,丰富创新扩散理论的相关研究。【创新
/局限】拓展了
命名实体识别技术的应用领域,为计算机算法管理提供了良好的思路。后续可优化创新演化路径的构建方法。

Abstract: Purpose/significanceExtracting algorithm terms from massive paper metadata and constructing the innovation evolution re⁃lationship between them is beneficial to the effective management and application of algorithms,so as to help researchers improve re⁃search efficiency and adopt cutting-edge achievements. Method/processFirstly,the GAN algorithm abstract is used as corpus to anno⁃tate algorithm terms by combining manual annotation with rule extraction,and the BERT-BiLSTM-CRF model is used to realize auto⁃matic extraction of algorithm terms.Then,the established model is applied to extract algorithm terms from the cited literature metadata of LDA algorithm papers,and the innovative evolution path of LDA algorithm is extracted from the cited content according to rule judg⁃ment and citation relationship. Result/conclusionIn the algorithm term experiment with GAN paper as an example,the accuracy rate,recall rate and F1 score reach 0.81,0.63 and 0.71 respectively,and we construct the innovation evolution path of the LDA using rela⁃tionship extraction.Our method can effectively promote the construction of the algorithm evolution network and the retrieval task of al⁃gorithms,and also enrich the related research of the innovation diffusion theory. Innovation/limitationIt expands the application field of named entity recognition technology and provides a good idea for computer algorithm management. The subsequent construction method of innovation evolution path can be optimized.