基于深度学习与需求规则融合的学术文献“目标数据”抽取模型构建与应用——以南海数字资源为例#br#



	-以南海数字资源为例

情报科学

基于深度学习与需求规则融合的学术文献“目标数据”抽取模型构建与应用——以南海数字资源为例#br#
-以南海数字资源为例

1.南京大学信息管理学院，南京 210046；2.中国科学技术大学数学科学学院，合肥 230026

Combine Deep learning with the rules of requirement to Construct the “target data” extraction model for the academic literature
-Take the Resources of the South China sea as an example

1. Department of Information Management, Nanjing University, Nanjing 210046, China; 2. School of mathematical sciences, University of Science and Technology of China, Hefei 230026

摘要/Abstract

摘要：

【目的/意义】从海量的学术文献内容中，抽取科研人员所需要的目标数据，一方面有助于提高研究者的科研效率，另一方面有利于改善目前文献数据库的检索服务。【方法/过程】根据科研人员的学术需求，首先通过深度学习方法从大量的学术文献中抽取目标数据。其次使用NER和TF-IDF抽取目标数据的“5W”规则，接着对目标数据做第二层需求规则过滤，凡是满足“5W”规则的数据，被鉴定为目标数据。最后对目标数据做第三层人工校验，最终生成学术文献“目标数据”。【结果/结论】本文构建的学术文献“目标数据”抽取模型的准确率可达0.88，再融合“5W”规则的过滤和最后的人工校验，不仅有利于提高科研工作者的学术文献查准率，而且一定程度上辅助文献数据库机构的检索工作。【创新/局限】深度学习与需求规则融合，实现学术文献的检索结果从学术文献的题录信息层面到进入学术文献内容的数据层面。

关键词:

深度学习；命名实体识别；词袋模型；TF-IDF, “5W”规则

Abstract:

【Purpose/significance】 Extracting the target data needed by researchers from the massive academic literature content will help improve the research efficiency of researchers on the one hand, and improve the retrieval services of current literature databases on the other hand. 【Method/process】According to the academic needs of scientific researchers, first extract target data from a large number of academic documents through deep learning methods. Secondly, NER and TF-IDF are used to extract the "5W" rule of the target data, and then the target data is filtered by the second-level requirement rule. Any data that meets the "5W" rule is identified as the target data. Finally, the third layer of manual verification is performed on the target data, and the academic literature "target data" is finally generated.【Result/conclusion】The accuracy rate of the "target data" extraction model for academic literature constructed in this paper can reach 86.6%, and the integration of "5W" rule filtering and final manual verification will not only improve the academic literature search of scientific researchers Accuracy rate, and to a certain extent assist the retrieval work of literature database institutions.【Innovation/limitation】Combine the Deeping learning with rules of requirement to realize the search results of academic literature from the bibliographic information level of academic literature to the data level of the content of academic literature.

Key words:

Deep learning; Named entity recognition, Bag-of-words model, TF-IDF, "5W" rule

彭玉芳, 陈将浩.

基于深度学习与需求规则融合的学术文献“目标数据”抽取模型构建与应用——以南海数字资源为例#br#

-以南海数字资源为例 [J]. 情报科学.

PENG Yu-fang, CHEN Jiang-hao.

Combine Deep learning with the rules of requirement to Construct the “target data” extraction model for the academic literature

-Take the Resources of the South China sea as an example [J]. Information Science.