情报科学 ›› 2021, Vol. 39 ›› Issue (8): 132-138.

• 业务研究 • 上一篇    下一篇

查询翻译方法研究——以汉英跨语言信息检索为例 

  

  • 出版日期:2021-08-01 发布日期:2021-08-05

  • Online:2021-08-01 Published:2021-08-05

摘要: 【目的/意义】跨语言信息检索研究的目的即在消除因语言的差异而导致信息查询的困难,提高从大量纷繁
复杂的查找特定信息的效率。同时提供一种更加方便的途径使得用户能够使用自己熟悉的语言检索另外一种语
言文档。【方法
/过程】本文通过对国内外跨语言信息检索的研究现状分析,介绍了目前几种查询翻译的方法,包括:
直接查询翻译、文献翻译、中间语言翻译以及查询—文献翻译方法,对其效果进行比较,然后阐述了跨语言检索关
键技术,对使用基于双语词典、语料库、机器翻译技术等产生的歧义性提出了解决方法及评价。【结果
/结论】使用自
然语言处理技术、共现技术、相关反馈技术、扩展技术、双向翻译技术以及基于本体信息检索技术确保知识词典的
覆盖度和歧义性处理,通过对跨语言检索实验分析证明采用知识词典、语料库和搜索引擎组合能够提高查询效
率。【创新
/局限】本文为了解决跨语言信息检索使用词典、语料库中词语缺乏的现象,提出通过搜索引擎从网页获
取信息资源来充实语料库中语句对不足的问题。文章主要针对中英文信息检索问题进行了探讨,解决方法还需要
进一步研究,如中文切词困难以及字典覆盖率低等严重影响检索的效率。

Abstract: Purpose/significanceThe purpose of cross-language information retrieval research is to eliminate the difficulty of infor⁃mation query caused by language differences, and improve the efficiency of finding specific information from a large number of com⁃plex. It also provides a more convenient way for users to retrieve documents in another language in a language they are familiar with.Method/processIn this paper, through analyzing the current research of cross-language information retrieval, introduces the several kinds of query translation methods, including: direct query translation, document translation, intermediate language translation and query, the method of literature translation, and on the comparison of the effect, and then expounds the cross-language retrieval key technologies, the use of bilingual dictionaries, corpus, based machine translation technology to produce ambiguity solution and evalua⁃tion.Result/conclusionTo use natural language processing technology, the co-occurrence technology, relevant feedback technology,extension, two-way translation technology and information retrieval based on ontology technology to ensure that the knowledge diction⁃ary coverage and ambiguity processing, through analyzing cross-language retrieval experiments prove that the knowledge dictionary and corpus and search engine combination can improve query efficiency.Innovation/limitationIn order to solve the problem of the lack of words in dictionaries and corpora for cross-language information retrieval, this paper puts forward the problem of the lack of sentence pairs in the corpus by using search engines to obtain information resources from web pages. This paper mainly discusses the problems of Chinese and English information retrieval, and the solutions need to be further studied, such as the difficulty of Chinese
word segmentation and the low dictionary coverage rate, which seriously affect the efficiency of retrieval.