情报科学 ›› 2021, Vol. 39 ›› Issue (2): 34-43.

• 理论研究 • 上一篇    下一篇

基于最大边界相关度的抽取式文本摘要模型研究

  

  • 出版日期:2021-02-01 发布日期:2021-03-11

  • Online:2021-02-01 Published:2021-03-11

摘要:

【目的/意义】为得到与原文相关度高、冗余度小的摘要,本文提出一种结合深度学习的无监督抽取式文本
摘要模型。【方法/过程】在最大边界相关度(Maximal Marginal Relevance, MMR)模型的基础上,利用深度学习中的
词嵌入与句嵌入的文本向量表示方法计算句子之间的相似度,并根据关键词与位置信息对句子重要性的影响对句
子排序,得到高质量的摘要。将本文提出的模型应用到2018 Byte Cup生成文章标题任务的数据集上验证模型效
果。【结果/结论】模型抽取单句摘要的Rouge-L值为28.24%,高于传统的抽取式文本摘要算法CI(17.37%)、Tex⁃
tRank(22.70%)和 MMR(23.52%);抽取多句摘要的 Rouge-L 值为 37.78%,高于传统的抽取式文本摘要算法 CI
(29.35%)、TextRank(34.15%)和MMR(31.09%);结果表明深度学习有助于提升抽取式文本摘要的效果。【创新/局
限】本文创新点在于将最大边界相关度(MMR)与深度学习相结合,综合考虑句子与全文相似度、关键词以及位置
信息等特征以抽取摘要;局限在于研究范围仅为抽取式文本摘要,后续研究将尝试融合抽象式文本摘要模型。

Abstract:

【Purpose/significance】To extract a summary with higher relevance and less redundancy, this article proposes an unsuper⁃
vised extractive text summarization model combined with deep learning.【Method/process】Based on the Maximal Marginal Relevance
model, we calculate the similarity between sentences by the word embedding as well as the sentence embedding, rank sentences by
keywords and location information on the importance of sentences to obtain higher-quality summary. The model is applied to the 2018
Byte Cup of automatic generating the article title task to conduct experiments to test the effectiveness.【Result/conclusion】The experi⁃
mental results show that the ROUGE-L value of extracting single sentence summary is 28.24%, which is higher than that of traditional
extracting text summary algorithms, i.e. CI (17.37% ), TextRank (22.70% ) and MR (23.52% ). The Rouge-L value of extracting
multi-sentence summaries is 37.78% , which is higher than that of traditional extracting text summarization algorithms, i.e. CI
(29.35%), TextRank (34.15%) and MR (31.09%). The experimental results prove that leveraging deep learning can improve the quali⁃
ty of text summary.【Innovation/limitation】The innovation is that we combine the maximal marginal relevance (MMR) with deep learn⁃
ing, and comprehensively consider the three aspects of sentence features to obtain extractive summary. The limitation is that the scope
of our research is limited to extractive text summarization, and the abstractive text summarization model will be integrated into our
model in further research.