情报科学 ›› 2024, Vol. 42 ›› Issue (6): 83-88.

• 业务研究 • 上一篇    下一篇

融合机器翻译与BERT-Whitening的同义句识别研究

  

  • 出版日期:2023-06-01 发布日期:2024-07-31

  • Online:2023-06-01 Published:2024-07-31

摘要: 【 目的/意义】构建机器翻译与BERT-Whitening结合的句子同义识别模型,可以提升同义句识别效果,为 下游的信息资源管理与服务应用提供支撑。【方法/过程】首先对同义句的类型及特点进行分析,在此基础上构建融 合机器翻译与BERT-Whitening的同义句识别模型,并通过实验对模型效果进行验证。其中,识别模型由句子预处 理、候选同义句识别、嵌入式文本表示与基于相似度融合的同义判断等四个部分构成。【结果/结论】实验结果表明, 机器翻译与 BERT-Whitening 结合模型的准确率、召回率和 F1分别达到了 0.840、0.859和 0.849,明显高于对照组。 【创新/局限】未在专业性较强的领域文本验证,普适性验证不足,且准确率、召回率提升空间较大。

Abstract: 【Purpose/significance】 Constructing a sentence synonym recognition model combining machine translation and BERTWhitening can improve the effect of synonym sentence recognition and provide support for downstream information resource manage⁃ ment and service applications.【 Method/process】 Firstly, the types and characteristics of synonymous sentences are analyzed, and on this basis, a synonymous sentence recognition model integrating machine translation and BERT-Whitening is constructed, and the ef⁃ fect of the model is verified through experiments. The recognition model consists of four elements: sentence preprocessing, candidate synonym recognition, embedded text representation and synonym judgment based on similarity fusion.【 Result/conclusion】 The experi⁃ mental results show that the accuracy, recall and F1 of the combined model of machine translation and BERT-Whitening are 0.840, 0.859 and 0.849, respectively, significantly higher than the control Group. 【Innovation/limitation】 Lack of text verification in highly specialized fields, insufficient universality verification, and significant room for improvement in accuracy and recall.