情报科学 ›› 2021, Vol. 39 ›› Issue (10): 165-169.

• 业务研究 • 上一篇    下一篇

远程监督实体关系抽取研究

  

  • 出版日期:2021-10-01 发布日期:2021-11-01

  • Online:2021-10-01 Published:2021-11-01

摘要: 【目的/意义】实体关系抽取是构建领域本体、知识图谱、开发问答系统的基础工作。远程监督方法将大规
模非结构化文本与已有的知识库实体对齐,自动标注训练样本,解决了有监督机器学习方法人工标注训练语料耗
时费力的问题,但也带来了数据噪声。【方法/过程】本文详细梳理了近些年远程监督结合深度学习技术,降低训练
样本噪声,提升实体关系抽取性能的方法。【结果/结论】卷积神经网络能更好的捕获句子局部、关键特征、长短时记
忆网络能更好的处理句子实体对远距离依赖关系,模型自动抽取句子词法、句法特征,注意力机制给予句子关键上
下文、单词更大的权重,在神经网络模型中融入先验知识能丰富句子实体对的语义信息,显著提升关系抽取性能。
【创新/局限】下一步的研究应考虑实体对重叠关系、实体对长尾语义关系的处理方法,更加全面的解决实体对关系
噪声问题。

Abstract: 【Purpose/significance】Entity relation extraction is the base work of constructing domain ontology, knowledge graph, devel?
oping question answering system. Distant supervision method aligns large-scale unstructured text with existing knowledge base enti?
ties and automatically annotates training samples. This solves the time-consuming and laborious problem of manually labeling training
corpus with supervised machine learning methods but brings data noise.【Method/process】This paper sorts methods of distant supervi? sion for entity relation extraction combined with deep learning in recent years.【Result/conclusion】The convolutional neural network can better capture the sentence parts and key features, the long and short-term memory network can better deal with the long-distance dependence of sentence entities. The model automatically extracts sentence lexical and syntactic features, the attention mechanism gives the sentence key context and words with greater weight, the integration of prior knowledge into the neural network can enrich the semantic information of sentence entity pairs and significantly improve the performance of relation extraction.【Innovation/limitation】The next step of research should consider the processing methods of entity pairs overlapping relation and entity pairs long-tail seman? tic relation, and solve the problem of entity pairs relation noise more comprehensively.