情报科学 ›› 2023, Vol. 41 ›› Issue (7): 90-99.

• 业务研究 • 上一篇    下一篇

融合词间关系与CNN的科学实体学术功能分类研究

  

  • 出版日期:2023-08-01 发布日期:2023-08-22

  • Online:2023-08-01 Published:2023-08-22

摘要:

【目的/意义】为了明晰科学实体在学术文本中承担的语义角色,进而建立特定领域的术语的知识结构,本
文提出一种以科学实体词间关系为特征工程的术语分类方法,从学术研究的语义属性角度,将学术文本中出现的
科学实体分为“研究领域”“研究问题”“研究方法”“研究工具”“其他”五类 。【方法/过程】采用依存句法分析的方
法,对于学术文本中存在两个及两个以上科学实体的句子,挖掘它们之间的最短依存路径,将最短依存路径上的谓
词成分作为实体之间的关系进行提取,构造2D矩阵作为卷积神经网络的输入,完成实体的分类研究。【结果/结论】
该模型在Web of Science上获取的“人工智能”领域的学术文献进行验证,精确率为89.38%,召回率为92.46%,F1值
为0.9089。【创新/局限】由科学实体关系构成的矩阵是稀疏矩阵,在计算过程中会对计算速度产生不利影响;在关系
抽取的环节比较依赖依存句法分析分析工具的处理效果。

Abstract:

【Purpose/significance】 In order to clarify the semantic role of scientific entities in academic texts, so as to establish the
knowledge structure of terminologies in specific fields, this paper proposes a classification method of scientific entities that takes the relationship between entities as a feature engineering. From the perspective of semantic attribute of academic research, the keywords of academic texts are divided into "research field", "research problem", "research method", "research tool" and "other".【Method/pro⁃cess】 With the method of dependency syntax analysis, for sentences with two or more entities in the academic text, the shortest depen⁃dency path between entities is mined, the predicate components on the shortest dependency path are extracted as the relationship be⁃tween entities, the relationship between words is taken as the feature, and a 2D matrix is constructed as the input of the convolutional neural network to complete the classification of entities.【Result/conclusion】 The model was verified in the academic literature in the field of "artificial intelligence" downloaded from Web of Science, with an accuracy rate of 89.38%, a recall rate of 92.46%, and an F1value of 0.9089.【Innovation/limitation】 The matrix formed by the relationship between entities is a sparse matrix, which will ad⁃versely affect the calculation speed in the calculation process.