情报科学 ›› 2023, Vol. 41 ›› Issue (1): 61-70.

• 理论研究 • 上一篇    下一篇

基于数字信号处理理论的新型信息检索模型研究

  

  • 出版日期:2023-01-01 发布日期:2023-04-06

  • Online:2023-01-01 Published:2023-04-06

摘要: 【目的/意义】大数据时代对各领域信息检索系统检索模型查准率提出了较高要求。然而,现阶段对于传统
检索模型的相关研究陷入瓶颈,表现为近若干年被提出的相关模型查准率提升幅度小,无法较好满足当前用户对
于精准查询的需求。由此,高查准率检索模型亟待探索。近年来,一种基于数字信号处理理论的新型检索模型构
架(Digital Signal Processing Framework: DSPF)被提出。同时,基于该模型构架的检索模型已被验证相较于传统检
索模型具备显著的查准率优势。【方法/过程】据此,本研究基于数字信号处理理论构架,引入了经典概率模型
F2LOG与F2EXP的词项权重计算方法,提出了模型DSPF-F2LOG与DSPF-F2EXP。为验证其查准率,本研究通过
实验法,基于多种不同类型的标准数据集,采用多项查准率指标,将其与多个经典检索模型进行查准率对比分析。
【结果/结论】实验结果表明,本研究所提模型较经典检索模型普遍具备更高查准率,且至少与当前查准率最高的基
于数字信号处理理论的检索模型具备相当的查准率表现。本研究所提出的两个高查准率 DSP模型可有效提高当
前各领域信息检索系统对于非结构化文本的查准率。【创新/局限】本研究提出了基于数字信号处理理论的高查准
率检索模型DSPF-F2LOG与DSPF-F2EXP。

Abstract: 【Purpose/significance】In the age of big data, the information retrieval (IR) models of the information retrieval systems in
different industries are required to have high precision. However, currently the research of the traditional IR models hit a plateau,
which is characterized by that the lately proposed traditional models always only outperform their baseline models by a very small mar? gin. Thus, the traditional models cannot well satisfy the need of the IR system users. Therefore, the IR models with high precision need to be explored urgently. In recent years, a new framework of IR models basing on the theory of digital signal processing (DSP) has been proposed and the models basing on this framework have been testified to have obvious advantage on precision compared with the tradi? tional ones.【Method/process】Accordingly, based on the DSP framework (DSPF), this research introduces the term weighting methods from the classic probabilistic models F2LOG and F2EXP and proposes two new IR models DSPF-F2LOG and DSPF-F2EXP. For testi? fying the precision of our models, this research conducts extensive experiments to compare our models with various classic baseline models based on various categories of standard datasets in terms of different metrics.【Result/conclusion】The experimental results show that our proposed models outperform the classic IR models and are comparable with the lately proposed DSPF-based IR models with the highest precision in the DSP-based IR models at present. The two high-precision IR models proposed by this research can improve the precision of the IR systems in different industries when being used for unstructured data.【Innovation/limitation】This study proposes two high-precision information retrieval models, denoted as DSPF-F2LOG and DSPF-F2EXP, based on the theory of digital signal processing.