情报科学 ›› 2025, Vol. 43 ›› Issue (4): 52-61.

• 理论研究 • 上一篇    下一篇

融合音频特征的古籍文本分析研究

  

  • 出版日期:2025-04-05 发布日期:2025-08-28

  • Online:2025-04-05 Published:2025-08-28

摘要: 【目的/意义】将古籍的文本特征和语音特征进行融合,以多模态的方式对古籍进行分析研究。【方法/过程】 首先利用BERT对文本特征进行提取,MFA语音强制对齐模型和Librosa音频处理工具对音频特征进行提取;然后, 在多模态融合层对文本特征和音频特征进行融合;最后,将融合特征输入BiLSTM-CRF层进行标签预测并输出结 果,构建融合音频特征的古籍文本分析模型TAMAF。【结果/结论】融入合适的音频特征后,所提模型在4个下游验 证任务中的表现均优于基线模型。其中,断句效果最高提升了 8.54%;分词效果最高提升了0.21%;命名实体识别 效果最高提升了0.97%;词性标注效果最高提升了0.85%。本文提出的TAMAF模型具有一定的优越性,能够有效 捕捉模态间的交互关系,提升对古籍的处理效果。【创新/局限】语音处理领域还有表达其他物理意义的音频特征可 以融入模型进行探究分析。此外,可以在更广泛的数据集更好地对音频和文本特征进行融合交互。

Abstract: 【Purpose/significance】Integrate the textual and phonetic features of ancient books to analyze and study them in a multi⁃ modal manner【. Method/process】Firstly, BERT was used to extract text features, and MFA voice forced alignment model and Librosa audio processing tool were used to extract audio features. Then, text features and audio features are fused in the multimodal fusion layer. Finally, the fusion features are input into BiLSTM-CRF layer for label prediction and output the results, and the ancient texts analysis model with audio features(TAMAF) is constructed.【Result/conclusion】After incorporating appropriate audio features, the pro⁃ posed model outperformed the baseline model in all four downstream validation tasks. Among them, the highest improvement in sen⁃ tence breaking effect was 8.54%; The highest improvement in word segmentation effect was 0.21%; The highest improvement in named entity recognition was 0.97%; The highest improvement in part of speech tagging performance was 0.85%. Therefore, the TAMAF model proposed in this paper is superior to past models, which can capture the interaction between modalities and improve the process⁃ ing effect of ancient books【. Innovation/limitation】In the field of speech processing, audio features that express other physical mean⁃ ings can be integrated into models for exploration and analysis. In addition, audio and text features can be better integrated and inter⁃ acted on a wider dataset.