情报科学 ›› 2021, Vol. 39 ›› Issue (5): 3-11.

• 专论 •    下一篇

金融领域文本序列标注与实体关系联合抽取研究

  

  • 出版日期:2021-05-01 发布日期:2021-05-11

  • Online:2021-05-01 Published:2021-05-11

摘要: 【目的/意义】金融领域实体关系抽取是构造金融知识库的基础,对金融领域的文本信息利用具有重要作 用。本文提出金融领域实体关系联合抽取模型,增加了对金融文本复杂重叠关系的识别,可以有效避免传统的流 水线模型中识别错误在不同任务之间的传递。【方法/过程】本文构建了高质量金融文本语料,提出一种新的序列 标注模式和实体关系匹配规则,在预训练语言模型BERT(Bidirectional Encoder Representations from Transformers) 的基础上结合双向门控循环单元 BiGRU(Bidirectional Gated Recurrent Units)与条件随机场 CRF(Conditional Random Field)构建了端到端的序列标注模型,实现了实体关系的联合抽取。【结果/结论】针对金融领域文本数据 进行实验,实验结果表明本文提出的联合抽取模型在关系抽取以及重叠关系抽取上的F1值分别达到了0.627和 0.543,初步验证了中文语境下本文模型对金融领域实体关系抽取的有效性。【创新/局限】结合金融文本特征提出 了新的序列标注模式并构建了基于BERT的金融领域实体关系联合抽取模型,实现了对金融文本中实体间重叠关 系的识别。

Abstract: 【Purpose/significance】Entity relation extraction in financial field is the basis of constructing financial knowledge base and plays an important role in the utilization of text information in financial field. This paper proposes a joint extraction model of entity relations in the financial field, which increases the recognition of complex overlapping relationships of financial texts, and can effec⁃ tively avoid the erroneous delivery between different tasks in the traditional pipeline model.【Method/process】This paper constructs a high-quality financial text corpus, proposes a new tagging scheme and entity relation matching rule, constructs an end-to-end se⁃ quence annotation model based on BERT and combines BiGRU and CRF, and realizes the joint extraction of entity relation.【Result/ conclusion】Experiment with text data in the financial field, the experimental results show that the F1-score of the relationship extrac⁃ tion and overlapping relationship extraction of the model proposed in this paper reach 0.627 and 0.543 respectively, which preliminari⁃ ly verifies the validity of the joint extraction model proposed in this paper for entity relationship extraction in the Chinese financial field.【Innovation/limitation】Combining with the characteristics of financial text, a new sequential annotation pattern is proposed and a joint extraction model of entity relations in financial domain based on BERT is constructed, which realizes the recognition of overlap⁃ ping relations between entities in financial text.