情报科学 ›› 2021, Vol. 39 ›› Issue (6): 108-116.

• 业务研究 • 上一篇    下一篇

基于多源数据的科技文献作者同名消歧研究

  

  • 出版日期:2021-06-01 发布日期:2021-06-25

  • Online:2021-06-01 Published:2021-06-25

摘要: 【目的/意义】本文利用多源数据,通过对科技文献作者的名称进行消歧,使作者与科技文献呈一一对应的 关系。【方法/过程】本文提出首先将采集的多源数据进行预处理,形成了同一姓名作者文献组成的待消解的重名数 据集,通过合作关系构建学术圈以发现歧义,最后通过机构和领域进行消歧。【结果/结论】实验采集了各级教育、自 动化及计算机技术、信息与知识传播、数理科学和化学、无线电电子学、中国医学等6个不同的学科的文献题录数 据,本文提出的基于规则的消歧具有良好的消歧效果。通过多源数据融合、机构和领域多指标消歧,能够达到较高 的消歧效果。【创新/局限】解决了同机构同领域消歧的难题,并考虑了增量问题,构建了完整的消歧模型。

Abstract: 【Purpose/significance】This paper uses multi-source data to disambiguate the names of authors of scientific and technologi⁃ cal documents, so that the authors and scientific and technological documents have a one-to-one correspondence.【Method/process】In this paper, firstly, the collected multi-source data are preprocessed to form a duplicate name data set composed of the same name au⁃ thor literature to be resolved. Then, the academic circle is constructed through cooperation to find ambiguity. Finally, the ambiguity is eliminated through institutions and fields.【Result/conclusion】The experiment collected the literature data of six different subjects, such as education at all levels, automation and computer technology, information and knowledge dissemination, mathematical science and chemistry, radio electronics, Chinese medicine and so on. The rule-based disambiguation proposed in this paper has good disam⁃ biguation effect. Through multi-source data fusion, multi-target disambiguation of mechanism and domain, high disambiguation effect can be achieved.【Innovation/limitation】The problem of disambiguation in the same organization and field is solved, and the incremen⁃ tal problem is considered to build a complete disambiguation model.