Please wait a minute...
档案学研究  2023, Vol. 37 Issue (5): 140-148    DOI: 10.16065/j.cnki.issn1002-1620.2023.05.017
  档案信息化 本期目录 | 过刊浏览 |
基于通用信息抽取模型的年鉴大事记知识图谱构建研究—以林业大事记知识图谱为例
周则旭1,韩红旗1,张均胜1,周潇2,徐紫燕1
1 中国科学技术信息研究所 北京 100038
2 西安电子科技大学经济与管理学院 西安 710126
Research on the Construction of Knowledge Graph of Yearbook Memorabilia Based on Unified Information Extraction Model: Take Forestry Memorabilia Knowledge Graph as an Example
ZHOU Zexu1,HAN Hongqi1,ZHANG Junsheng1,ZHOU Xiao2,XU Ziyan1
1 Institute of Scientific and Technical Information of China, Beijing 100038
2 School of Economics and Management Xidian University, Xi'an 710126
全文: HTML    PDF(7763 KB)  
输出: BibTeX | EndNote (RIS)      
摘要: 

针对大事记数据的特点,以及各行业对于大事记的信息化需求,本文尝试提出一种利用知识图谱来整理大事记中的信息的方法,从而有效抽取领域年鉴大事记中的多元主体,厘清主体间复杂关系,并用于查阅及宣传工作。首先,根据大事记中各类事件的特点,进行事件分类和概念建模,从整体的角度,对大事记的信息化处理方案进行设计;其次,对知识图谱进行顶层设计,选取通用信息抽取模型进行信息抽取、知识图谱构建与存储的工作;最后,以《中国林业年鉴》的林业大事记部分为例,说明知识图谱的构建过程和应用。本文提出的知识图谱的构建过程和方法能够通过少量人工标注数据将大事记转化为结构化的知识库形式,便于高效便捷地检索大事记记录的事件信息,为挖掘大事记档案文本的价值提供了基础。

关键词 通用信息抽取模型信息抽取知识图谱大事记    
Abstract

In view of the characteristics of memorabilia and the informatization needs of various industries, this paper tries to propose a method of using knowledge graph to organize the information in memorabilia, so as to effectively extract the multiple subjects in the yearbook of specific field, clarify the complex relationship between subjects, and use it for access and publicity work. Firstly, according to the characteristics of various events in memorabilia, event classification and conceptual modeling are carried out, and the information processing scheme of the memorabilia is designed from the overall perspective; then the top-level design of the knowledge map is carried out, and the unified information extraction model is selected for the work of information extraction, knowledge graph construction and storage; finally, the construction process and application of the knowledge graph are illustrated by taking the Chinese Forestry Memorabilia, which is a part of the China Forestry Yearbook, as an example. The proposed knowledge graph construction process and method can transform the memorabilia into a structured knowledge base through a small amount of manually annotated data, which facilitates the efficient and convenient retrieval of event information recorded in the memorabilia and provides a basis for mining the value of the archival texts of the memorabilia.

Key wordsunified information extraction model    information extraction    knowledge graph    memorabilia
出版日期: 2024-10-28
通讯作者: 韩红旗   
引用本文:

周则旭, 韩红旗, 张均胜, 周潇, 徐紫燕. 基于通用信息抽取模型的年鉴大事记知识图谱构建研究—以林业大事记知识图谱为例[J]. 档案学研究, 2023, 37(5): 140-148.
ZHOU Zexu, HAN Hongqi, ZHANG Junsheng, ZHOU Xiao, XU Ziyan. Research on the Construction of Knowledge Graph of Yearbook Memorabilia Based on Unified Information Extraction Model: Take Forestry Memorabilia Knowledge Graph as an Example. Archives Science Study, 2023, 37(5): 140-148.

链接本文:

http://journal12.magtechjournal.com/Jwk_dax/CN/10.16065/j.cnki.issn1002-1620.2023.05.017      或      http://journal12.magtechjournal.com/Jwk_dax/CN/Y2023/V37/I5/140

[1] 邓尧. 纵向比较视野下省级综合年鉴大事记的质量管控—以《湖南年鉴》大事记编撰为例[J]. 新疆地方志, 2022(1):36-40.
[2] [32] 徐彦红. 新时代做好高校档案编研工作探析—以首经贸档案编研工作为例[J]. 北京档案, 2020(3):39-40,46.
[3] 戴羽, 吴颖冰, 刘青, 等. 国史视角下体育大事记撰写研究[J]. 成都体育学院学报, 2022(1):45-48.
[4] 王佳宇, 李楹, 马春梅, 等. 融合实体信息的图卷积神经网络的短文本分类模型[J]. 天津师范大学学报(自然科学版), 2023(1):67-72.
[5] 熊回香, 杨梦婷, 李玉媛. 基于深度学习的信息组织与检索研究综述[J]. 情报科学, 2020(3):3-10.
[6] 漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017(1):4-25.
[7] 张海涛, 周红磊, 李佳玮, 等. 信息不完全状态下重大突发事件态势感知研究[J]. 情报学报, 2021(9):903-913.
[8] 王成文, 熊励. 基于知识图谱的突发公共卫生事件辅助诊疗研究[J]. 情报科学, 2023(4):164-174.
[9] 李纲, 王施运, 毛进, 等. 面向态势感知的国家安全事件图谱构建研究[J]. 情报学报, 2021(11):1164-1175.
[10] 毛瑞彬, 朱菁, 李爱文, 等. 基于自然语言处理的产业链知识图谱构建[J]. 情报学报, 2022(3):287-299.
[11] 王丹, 张海涛, 刘嫣, 等. 全景生态视角的微博舆情多维图谱构建研究[J]. 情报学报, 2019(12):1275-1285.
[12] 熊欣, 王昊, 邓三鸿. 面向方志知识图谱的术语抽取模型迁移学习研究[J]. 情报理论与实践, 2021(4):176-184.
[13] 赵雪芹, 路鑫雯, 李天娥, 等. 领域知识图谱在非遗档案资源知识组织中的应用探索[J]. 档案学通讯, 2021(3):55-62.
[14] 刘慧琳, 牛力. 标准文件的知识图谱组织模式探究[J]. 档案学通讯, 2021(5):58-65.
[15] 赵伟, 张览, 望俊成. 金融领域标准文献知识图谱的构建与实现[J]. 情报工程, 2022(6):103-113.
[16] 邓君, 王阮. 口述历史档案资源知识图谱与多维知识发现研究[J]. 图书情报工作, 2022(7):4-16.
[17] Riloff E. Automatically constructing a dictionary for information extraction tasks: proceedings of the eleventh national conference on artificial intelligence[C]. Menlo Park: AAAI Press, 1993:811-816.
[18] Kim J T, Moldovan D I. Acquisition of linguistic patterns for knowledge-based information extraction[J]. IEEE Transactions on Knowledge and Data Engineering, 1995(5):713-724.
[19] Chieu H L, Ng H T. A maximum entropy approach to information extraction from semi-structured and free text[C]// Eighteenth national conference on Artificial intelligence. 2002:786-791.
[20] Llorens H, Saquete E. TimeML events recognition and classification learning CRF models with semantic roles: Proceedings of the 23rd international conference on computational linguistics[C]. Stroudsburg: ACL, 2010:725-733.
[21] Ahn D. The stages of event extraction: Proceedings of the 48th annual meeting of the association for computational linguistics[C]. Stroudsburg: ACL, 2006:789-797.
[22] Nguyen T H, Cho K, Grishman R. Joint event extraction via recurrent neural networks: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies[C]. Stroudsburg: ACL, 2016:300-309.
[23] Chen Y, Xu L, Liu K, et al. Event extraction via dynamic multi-pooling convolutional neural networks: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing[C]. Stroudsburg: ACL, 2015:167-176.
[24] Liu P, Yuan W, Fu J, et al. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing[J]. ACM Computing Surveys, 2023(9):1-35.
[25] [33] 高立伟. 档案大事记编撰工作的实践与思考[J]. 办公室业务, 2020(11): 7,15.
[26] [27] Lu Y, Liu Q, Dai D, et al. Unified Structure Generation for Universal Information Extraction: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers)[C]. Stroudsburg: ACL, 2022:5755-5772.
[28] 黄奇, 钱韵洁, 袁勤俭, 等. 基于图形数据库的OWL本体存储模型研究[J]. 情报学报, 2019(3):310-321.
[29] 邵权熙, 刘慧. 《中国林业年鉴》发展之路[J]. 年鉴信息与研究, 2008(5):29-31.
[30] Nakayama H, Kubo T, Kamura J. Doccano: Text Annotation Tool for Human[EB/OL]. [2023-06-28]. https://github.com/doccano/doccano.
[31] PaddleNLP Contributors. PaddleNLP: An Easy-to-use and High Performance NLP Library[EB/OL]. [2023-06-28]. https://github.com/paddlepaddle/paddlenlp.
[1] 曾静怡. 上下文在照片档案叙事中的应用[J]. 档案学研究, 2021, 35(6): 100-105.
[2] 熊回香, 李建玲. 基于CSSCI的近十年我国档案学研究知识图谱分析[J]. 档案学研究, 2020, 34(3): 16-24.
[3] 李姗姗, 邱智燕. 基于CiteSpace的我国少数民族档案文献遗产保护研究述评与展望[J]. 档案学研究, 2020, 34(1): 91-96.
[4] 牛力, 蒋菲, 曾静怡. 面向数字记忆的数字文档资源描述框架构建研究[J]. 档案学研究, 2019, 33(4): 40-49.
[5] 郭文平, 迪昕. 近十年我国高校档案研究的特征分布与热点分析—— 基于CNKI核心期刊的文献计量及可视化分析[J]. 档案学研究, 2019, 33(2): 25-30.
[6] 杨茜雅. 中国联通电子档案数据挖掘与智能利用的研究[J]. 档案学研究, 2018, 32(6): 105-109.
[7] 牛力, 杜丽华, 韩小汀. 从档案学核心期刊看国内档案学研究现状及发展趋势[J]. 档案学研究, 2018, 32(3): 4-9.
[8] 周鑫, 倪丽娟, 陈媛媛. 国内档案服务研究进程探析[J]. 档案学研究, 2017, 31(2): 68-73.
[9] 奉国和, 李媚婵. 基于Citespace的档案学研究可视化分析[J]. 档案学研究, 2014, 28(5): 18-23.
[10] 马海群, 姜鑫. 我国档案学研究主题的知识图谱绘制——以共词分析可视化为视角[J]. 档案学研究, 2014, 28(5): 7-11.
[11] 周耀林, 赵跃. 个人存档研究热点与前沿的知识图谱分析[J]. 档案学研究, 2014, 28(3): 23-29.
[12] 任越. 基于知识图谱的我国档案价值问题研究述评[J]. 档案学研究, 2013, 27(6): 15-20.
[13] 马海群, 姜鑫. 我国档案学研究热点与前沿演进的知识图谱分析[J]. 档案学研究, 2013, 27(4): 16-22.