情报科学 ›› 2021, Vol. 39 ›› Issue (10): 76-87.

• 理论研究 • 上一篇    下一篇

基于多层异构网络的自动问答模型研究

  

  • 出版日期:2021-10-01 发布日期:2021-11-01

  • Online:2021-10-01 Published:2021-11-01

摘要: 【目的/意义】解决自动问答系统构建过程中数据集构建成本高的问题,以及自动问答过程中仅考虑问题或
答案本身相关性的局限。【方法/过程】提出了一种融合标注问答库和社区问答数据的数据集构建方法,构建问题关
键词-问题-答案-答案簇多层异构网络模型,并给出了基于该模型的自动问答算法。获取图书馆语料进行处理作
为实验数据,将BERT-Cos、AINN、BiMPM模型作为对比对象进行了实验与分析。【结果/结论】通过实验得到了各
模型在图书馆自动问答任务上的效果,本文所提模型在各评价指标上均优于其他模型,模型准确率达87.85%。【创
新/局限】本文提出的多数据源融合数据集构建方法和自动问答模型在问答任务中相对于已有方法具有更好的表
现,同时根据模型效果分析给出用户提问词长建议。

Abstract: 【Purpose/significance】To solve the problem of high cost of dataset construction during the build of the automatic question
answering system and the problem of only considering the relevance of the questions and answers in the process of automatic question answering.【Method/process】This paper proposes a dataset construction method of combining annotated question and answer data? base and community question and answer data; constructs a multi layers heterogeneous network model based on keywords of ques? tions-questions-answers-clusters of answers; and proposes an algorithm of automatic question answering based on this model. Obtain library corpus, process and use it as experimental data. Experiments and analysis are carried out with the BERT-Cos, AINN, and BiMPM models as comparison objects.【Result/conclusion】Through experiments, the effects of each model on the library's automatic question answering task are obtained. The model proposed in this paper is superior to other models in all evaluation indicators, and the accuracy rate of the model reaches 87.85%.【Innovation/limitation】The dataset construction method based on multi data source fu? sion and automatic question answering model proposed in this paper have better performance in the question answering task than exist? ing methods; at the same time, suggestions for the length of the user's question are given based on the analysis of the model effect.