情报科学 ›› 2025, Vol. 43 ›› Issue (1): 89-97.

• 理论研究 • 上一篇    下一篇

大语言模型驱动的科技查新:能力评测

  

  • 出版日期:2025-01-05 发布日期:2025-06-27

  • Online:2025-01-05 Published:2025-06-27

摘要: 【目的/意义】大语言模型具备多领域知识,能够实现跨领域文本深度语义理解与生成,为提高科技查新自 动化水平提供了机遇,本文将探索大语言模型在查新工作中深度应用的潜力。【方法/过程】本文围绕查新工作关键 环节对大语言模型能力进行评测:首先构建了四个评测任务,包括查新点生成、关键词生成、方法对比和总结对比; 然后选取了不同参数规模的通用大语言模型进行评测,并尝试构建不同提示模板以充分解锁大语言模型能力,最 后将通用模型与领域语料微调后的模型进行对比,对其效果进行更深入评估。【结果/结论】实验结果表明,不同模 型都需要通过少样本学习方式理解学习任务,四样本相较单样本设置更有效。在方法对比和总结对比任务上,微 调能明显优化结果,查新点生成与关键词生成任务模型微调没有提升效果。十亿级参数模型ChatGLM3-6B长文 本理解处理能力不足,GPT-4模型长文本理解处理能力较强。【创新/局限】本文证实了大语言模型在科技查新上具 有重大应用潜力,未来将进行细化研究进一步提升其与查新工作的结合深度。

Abstract: 【Purpose/significance】Large language model has multi-domain knowledge and can achieve deep semantic understanding and generation of cross-domain text, offering opportunities to enhance the automation level of science and technology novelty search. This article will explore the potential of deep application of large language models in novelty search work.【Method/process】This ar⁃ ticle evaluates the ability of large language models in relation to the crucial aspects of novelty search work. First, four evaluation tasks were established, including novelty point generation, keyword generation, method comparison and conclusion comparison. Then we se⁃ lected general large language models with different parameter sizes for evaluation, and attempted to build different prompt templates to fully unlock the ability of the large language models. Finally, we compared the general models with the model fine-tuned on the do⁃ main corpus to further evaluate the large language models.【Result/conclusion】Experimental results show that different models need to understand the learning task through few-shot learning, and the four-shot setting is effective than the one-shot setting. In the method comparison and conclusion comparison tasks, fine-tuning can significantly optimize the results, while there is no improvement when fine-tuning the model for the novelty point generation and keyword generation tasks. The long text understanding and processing capa⁃ bilities of ChatGLM3-6B are insufficient. The GPT-4 model has sufficient long text understanding and processing capabilities.【Inno⁃ vation/limitation】We confirms that large language models have significant application potential in science and technology novelty search. In the future, detailed research will be conducted to further enhance the depth of their integration with novelty search.