情报科学 ›› 2021, Vol. 39 ›› Issue (8): 94-102.

• 业务研究 • 上一篇    下一篇

基于深度学习的多学科多层次学术论文结构功能识别方法比较研究 

  

  • 出版日期:2021-08-01 发布日期:2021-08-05

  • Online:2021-08-01 Published:2021-08-05

摘要: 【目的/意义】学术论文的结构功能是学术论文篇章结构和语义内容的集中体现,目前针对学术论文结构功
能的研究主要集中在对学术论文不同层次的识别以及从学科差异性视角探讨模型算法的适用性两方面,缺少模
型、学科、层次之间内在联系的比较研究。【方法
/过程】选择中医学、图书情报、计算机、环境科学、植物学等学科中
文权威刊物发表的学术论文作为实验语料集,在引入
CNNLSTMBERT等深度学习模型的基础上,分别从句子、
段落、章节内容等层次对学术论文进行结构功能识别。【结果
/结论】实验结果表明,BERT模型对于不同学科学术论
文以及学术论文的不同层次的结构功能识别效果最优,各个模型对于不同学科学术论文篇章内容层次的识别效果
均最优,中医学较之其他学科的学术论文结构功能识别效果最优。此外,利用混淆矩阵给出不同学科学术论文结
构功能误识的具体情形并分析了误识原因。【创新
/局限】本文研究为学术论文结构功能识别研究提供了第一手的
实证资料。

Abstract: Purpose/significanceThe textual structure and semantic content of academic text is reflected by its structure function.
The current researches mainly focus on its multi-disciplinary and multi-level recognition. There is a lack of the comparative research on the relations among model, discipline and level.
Method/processThis paper built a multi-disciplinary experimental corpus, in⁃cluding the disciplines such as traditional Chinese medicine, library and information science, computer, environmental science and phytology. The structure function of academic texts was recognized from sentence, paragraph and chapter content based on the deep learning models of CNN, LSTM, BERT.Result/conclusionThe experimental results show that BERT performs best in the multi-dis⁃ciplinary and multi-level environment, the deep learning models perform well in the multi-disciplinary and chapter-content level en⁃vironment, the recognition effectiveness of academic text in traditional Chinese medicine was much better than those of other disci⁃plines. In addition, the confusion matrix is used to reveal the misidentification errors, and the causes of the errors associated with the identification of academic texts were analyzed simultaneously.Innovation/limitationThis study provides first-hand empirical data for
the research of academic text structure function recognition.