情报科学 ›› 2025, Vol. 43 ›› Issue (9): 109-121.

• 业务研究 • 上一篇    下一篇

基于视觉注意力模拟的交互式科学图表理解研究

  

  • 出版日期:2025-09-05 发布日期:2025-12-12

  • Online:2025-09-05 Published:2025-12-12

摘要: 【目的/意义】科学图表作为一种直观且结构化的信息展示方式,能够有效地呈现复杂的研究成果。然而, 现有的科学文献解析研究多聚焦于文本对象,对科学图表的解读缺乏深入探索。本文科学图表中流程图为研究对 象,探索适用于该类图表的理解方法。【方法/过程】本研究提出了一种两阶段的科学图表理解方法:第一阶段通过 生成粗糙边界框实现图表模块的初步定位;第二阶段结合全局与局部特征的视觉注意力点模拟策略,构建正负样 例点以优化模型性能,从而实现模块化解析与语义对齐。【结果/结论】在使用1.3万张随机生成流程图数据集及人 工标注真实数据集进行实验验证后,结果证明,本文提出的方法在mIoU和mrIoU指标上均优于现有模型,整体性 能达到了 mrIoU 值 0.694。即使在复杂流程图的场景下,mrIoU 也能达到 0.608,较其他模型,提升幅度超 0.22。【创 新/局限】本研究为科学图表中流程图的交互式理解提供了系统化的解决方案,同时为多模态交互式阅读技术的发 展奠定了坚实的理论基础。

Abstract: 【Purpose/significance】Scientific diagrams serve as an intuitive and structured way to present complex research findings. However, existing research on scientific literature analysis has primarily focused on textual objects, with insufficient exploration of dia⁃ gram interpretation. The flowcharts and framework diagrams in this paper's scientific diagrams are the research objects, aiming to ex⁃ plore intelligent understanding methods applicable to scientific figures.【Method/process】This study proposes a two-stage approach for scientific diagram understanding. In the first stage, coarse bounding boxes are generated to achieve preliminary localization of dia⁃ gram modules. In the second stage, a visual attention point simulation strategy that integrates both global and local features is em⁃ ployed to construct positive and negative sample points, thereby optimizing model performance and enabling modular parsing and se⁃ mantic alignment.【Result/conclusion】Experiments on a manually annotated real dataset of 13,000 randomly generated flowchart data show that the method proposed in this paper outperforms existing models in both mIoU and mrIoU metrics. The overall performance mrIoU is as high as 0.694. Even in complex flowchart scenarios, the mrIoU reaches 0.608, which is more than 0.22 higher than other models.【Innovation/limitation】This method provides a systematic solution for interactive flowchart diagram understanding and estab⁃ lishes a theoretical foundation for multimodal interactive reading technologies