情报科学 ›› 2024, Vol. 42 ›› Issue (9): 82-90.

• 业务研究 • 上一篇    下一篇

基于功能句识别的科研文献综述自动生成方法研究

  

  • 出版日期:2024-09-01 发布日期:2024-11-06

  • Online:2024-09-01 Published:2024-11-06

摘要: 【目的/意义】为解决科研文献资源爆炸性增长带来的知识获取困境,本研究构建基于功能句识别的科研文 献综述自动生成方法,为科研文献综述的自动生成提供新的思路和方法验证。【方法/过程】首先,使用 LSTM、 TextCNN和BERT等模型进行功能句识别训练,选取F1值为0.90的最优模型LSTM,结合特征规则匹配的方法对 不同类别功能句进行抽取。其次,通过评分筛选、主题聚类和生成式大语言模型优化等过程处理候选功能句。最 终,将处理内容填充至构建的综述通用模板,生成所需的科研文献综述。【结果/结论】利用功能句识别模型和特征 规则匹配方法准确抽取出了科研文献的不同类别功能句,同时将特征规则和生成式大语言模型的优势转化为识别 准确率的提升、内容丰富度的扩充和可读性的优化,最终自动生成了“引文分析”领域主题的综述内容。【创新/局 限】未能通过模型训练的方式自动识别科研文献的概念句和不足句;在评估候选功能句的重要性时,未能深入语义 关系制定更为精准和规范的评分规则。

Abstract: 【Purpose/significance】In order to address the challenges of knowledge acquisition posed by the explosive growth of re⁃ search literature, this study proposes an Automatic Summarization method based on functional sentence recognition. This provides new ideas and validates methods for the automatic generation of research literature reviews.【Method/process】Firstly, models such as LSTM, TextCNN, and BERT are employed for functional sentence recognition training. The LSTM model, achieving an optimal F1 value of 0.90, is selected. This model is combined with the feature rule matching method to extract functional sentences of different cat⁃ egories. Secondly, candidate functional sentences are processed through scoring filtration, topic clustering, and optimization using a generative large language model. Finally, the processed content is filled into a constructed general template to generate the required Automatic Summarization of research literature.【Result/conclusion】The functional sentence recognition model and feature rule match⁃ ing method accurately extract functional sentences of different categories from research literature. The advantages of feature rules and generative large language models are transformed into improved recognition accuracy, expanded content richness, and optimized read⁃ ability. Ultimately, an Automatic Summarization on the topic of "citation analysis" in the field is generated.【Innovation/limitation】The method did not automatically identify conceptual sentences and insufficient sentences in research literature through model training. When evaluating the importance of candidate functional sentences, more precise and standardized scoring rules for semantic relation⁃ ships were not formulated.