情报科学 ›› 2021, Vol. 39 ›› Issue (1): 13-20.

• 专论 • 上一篇    下一篇

科技文献资源中方法知识元的抽取研究 

  

  • 出版日期:2021-01-01 发布日期:2021-01-25

  • Online:2021-01-01 Published:2021-01-25

摘要: 【目的/意义】为准确抽取科技文献中的方法知识元,实现科技文献更细粒度知识组织和检索。【方法/过程】
本研究提出一种基于规则的方法知识元抽取方法,该方法主要分为两个阶段:方法知识元初始描述规则半自动化
识别阶段和方法知识元及其描述规则自动化抽取和更新阶段。第一阶段根据方法知识元的特征,以人工—机器相
结合的方法识别方法知识元的组成维度及初始描述规则。第二阶段依据第一阶段识别的方法知识元初始描述规
则,自动从科技文献中提取方法知识元,并基于
PreFixSpan算法从新识别的方法知识元中挖掘出新的方法知识元描
述规则,以实现方法知识元及其描述规则的动态更新。【结果
/结论】在对16篇科技文献的初步评估中,实验结果P
R以及F值分别为0.710.800.73(均>0.5)表明该方法的可行性和有效性,该抽取方法对更细粒度的知识组织和
检索也有一定借鉴作用。【创新
/局限】方法的局限性在于需要一定的人工参与方法知识元描述规则的提取。

Abstract: Purpose/significanceIn order to accurately extract the method knowledge elements (KEs) in scientific literature and
achieve more granular knowledge organization and retrieval.
Method/processThis study proposes a rule-based method for extracting
method KEs in scientific literature. The method is divided into two stages: Semi-automated extraction stage of initial description rules
of method KEs and automated derivation and update stage of method KEs along with their additional description rules. The former
semi-automatically extracts initial method KEs based on the description characteristics of method KEs to get high-quality method
KEs, and summarizes the composition dimensions and initial description rules finally. This stage provides the data foundation for the
next stage, and also provides further insights into the composition dimensions of method KEs. The latter regards the initial rules as
clue words, and uses regular expressions to extract the method KEs from text, and then derives additional rules by the PreFixSpan algo⁃
rithm to supplement the initial rules.
Result/conclusionIn a preliminary evaluation on 16 papers, the P, R and F for the method KEs
extraction are 0.71, 0.80 and 0.73
>0.5respectively, indicating the effectiveness of the method, and the method has certain reference
effect for more granular knowledge organization and retrieval.
Innovation/limitationThe limitation of the method lies in the need of
manual intervention in the extraction of the method knowledge elements description rules.