情报科学 ›› 2022, Vol. 39 ›› Issue (1): 141-147.

• 业务研究 • 上一篇    下一篇

基于深度学习与需求规则融合的学术文献“目标数据”抽取模型构建与应用 ——以南海数字资源为例 

  

  • 出版日期:2022-01-01 发布日期:2022-01-13

  • Online:2022-01-01 Published:2022-01-13

摘要: 【目的/意义】从海量的学术文献内容中,抽取科研人员所需要的目标数据,一方面有助于提高研究者的科
研效率,另一方面有利于改善目前文献数据库的检索服务。【方法
/过程】根据科研人员的学术需求,首先通过深度
学习方法从大量的学术文献中抽取目标数据。其次使用
NERTF-IDF抽取目标数据的“5W”规则,接着对目标
数据做第二层需求规则过滤,凡是满足“
5W”规则的数据,被鉴定为目标数据。最后对目标数据做第三层人工校
验,最终生成学术文献“目标数据”。【结果
/结论】本文构建的学术文献“目标数据”抽取模型的准确率可达0.88,再融
合“
5W”规则的过滤和最后的人工校验,不仅有利于提高科研工作者的学术文献查准率,而且一定程度上辅助文献
数据库机构的检索工作。【创新
/局限】深度学习与需求规则融合,实现学术文献的检索结果从学术文献的题录信息
层面到进入学术文献内容的数据层面。

Abstract: Purpose/significanceExtracting the target data needed by researchers from the massive academic literature content is con⁃ductive to improve the research efficiency of researchers on the one hand, and improve the retrieval services of current literature data⁃bases on the other hand.Method/processAccording to the academic needs of scientific researchers, first extract target data from a large number of academic documents through deep learning methods. Secondly, NER and TF-IDF are used to extract the "5W" rule of the target data, and then the target data is filtered by the second-level requirement rule. Any data that meets the "5W" rule is identi⁃fied as the target data. Finally, the third layer of manual verification is performed on the target data, and the academic literature "target data" is finally generated. Result/conclusionThe accuracy rate of the "target data" extraction model for academic literature construct⁃ed in this paper can reach 86.6%, and the integration of "5W" rule filtering and final manual verification will not only improve the aca⁃demic literature search of scientific researchers accuracy rate, and to a certain extent assist the retrieval work of literature database in⁃stitutions. Innovation/limitationCombine the deep learning with rules of requirement to realize the search results of academic litera⁃ture from the bibliographic information level of academic literature to the data level of the content of academic literature.