情报科学 ›› 2023, Vol. 41 ›› Issue (3): 155-163.

• 博士论坛 • 上一篇    下一篇

面向南海叙事的事件要素自动抽取方法研究

  

  • 出版日期:2023-03-01 发布日期:2023-04-10

  • Online:2023-03-01 Published:2023-04-10

摘要: 【目的/意义】对南海历史事件中具有标识意义的事件要素进行提炼与梳理,是构建南海大事记、讲好中国
南海故事的基础。【方法/过程】首先总结南海历史事件的特殊性,进而论述南海叙事的具体维度,在此基础上定义
事件要素划分标准实现对南海历史事件的规范建模,接着提出了一种结合规则与深度学习的事件要素自动抽取方
法,最后以南海相关学术论文为对象,通过实证研究验证了该方法的有效性及效率。【结果/结论】研究表明,BERT+
BiLSTM+CRF模型表现优于其它对比模型,宏观F1值达到87.73%;通过规则约束优化BERT+BiLSTM+CRF模型
后,宏观F1值达到88.76%,取得了不错的效果,在面向泛化南海历史事件文本时能快速、有效地抽取出各类型事件
要素实例。【创新/局限】结合南海历史事件的特征,探索了面向多维度南海叙事的事件要素自动抽取方法,实现学
术论文中各类型事件要素的抽取,后续有待在更多文献资料类型上进行泛化实验。

Abstract: 【Purpose/significance】Extracting and sorting out the elements of the South China Sea historical events is the basis for con?
structing the South China Sea historical events and telling the story of the South China Sea.【Method/process】First, summarizes the
particularity of the history of the South China Sea events, and then discusses the specific dimensions of the narrative of the South
China Sea, on this basis, the event element division criteria are defined to realize the standard modeling of the historical events in the
South China Sea, and then puts forward a combination of rules and deep learning event elements automatically extract method. Finally, the effectiveness and efficiency of the proposed method are verified by an empirical study on academic papers related to the South China Sea. 【Result/conclusion】 The results show that BERT+BiLSTM+CRF model performs better than other models, and the
Macro_F1 value reaches 87.73%. After optimizing the BERT+BiLSTM+CRF model with rule constraints, the Macro_F1 value reaches
88.76%, which achieves a good effect, and can quickly and effectively extract the event elements of historical events in the South
China Sea.【Innovation/limitation】Combined with the characteristics of historical events in the South China Sea, an automatic extrac?
tion method of event elements oriented to multi-dimensional narrative of the South China Sea is explored to realize the extraction of
various event elements in academic papers. Further generalization experiments need to be carried out on more literature data types.