情报科学 ›› 2021, Vol. 39 ›› Issue (5): 115-123.

• 业务研究 • 上一篇    下一篇

面向知识问答系统的图情学术领域知识图谱构建:多源数据整合视角

  

  • 出版日期:2021-05-01 发布日期:2021-05-12

  • Online:2021-05-01 Published:2021-05-12

摘要:

【目的/意义】人工智能时代,知识问答系统已成为领域知识服务的典型发展趋势,构建学术领域知识图谱
是实现学科知识问答系统的必备条件和知识基础,从多源数据整合的视角探寻学术领域知识图谱构建方法对实现
知识问答系统意义重大。【方法/过程】针对图情学术领域知识图谱构建问题,采用半结构化访谈对图情学术领域知
识问答的现有平台与潜在需求进行调研,通过构建知识问答需求模型,对照需求整合CNKI、Web of Science、百度
学术、百度百科以及科学网等平台获取多源数据作为核心数据集,探索了通过Python核心库“selenium+beautiful⁃
soup+pandas”与共现分析法进行知识抽取,通过编辑距离算法与Jaccard算法进行知识融合,通过Neo4j图数据库完
成知识存储并实现可视化展示的知识图谱构建方案。【结果/结论】以构建“知识抽取”学术领域知识图谱为例,在此
基础上展开语义检索、知识问答、合作推荐等知识服务功能探索。结果表明,采用所提方法建构的知识图谱数据较
完备、知识表示较准确,能够有效响应满足用户精准获取领域知识的需求。【创新/局限】本文从多源数据整合视角
构建的图情学术领域知识图谱,知识表示较准确,能够有效满足用户精准获取领域知识的需求。在后续的研究中
将扩大样本数据,并通过逻辑推理方式判断实体间关系。

Abstract:

【Purpose/significance】In the age of artificial intelligence, knowledge question answering system has become a typical de⁃
velopment trend of domain knowledge services. Constructing an academic domain knowledge graph is an indispensable condition and
knowledge foundation for achieving discipline knowledge Q&A system.【Method/process】For LIS academic knowledge graph construc⁃
tion problem, we used a semi-structured interview to investigate the existing platform and potential needs of LIS academic knowledge
Q&A and built a knowledge quiz demand model. According to the demands, we integrated CNKI, Web of Science, Baidu Academic,
Baidu Encyclopedia, and ScienceNet to obtain multi-source data as the core data sets. In addition, we explored the construction
scheme of knowledge graph, which uses the Python core library "selenium+beautifulsoup+pandas”, the co-occurrence analysis meth⁃
od for knowledge extraction, the Edit distance algorithm and the Jaccard algorithm for knowledge fusion and the Neo4j graph database
completed knowledge storage and achieved the visualization display.【Result/conclusion】Building a "Knowledge Extraction" academ⁃
ic knowledge graph as an example, we explored knowledge service functions such as semantic retrieval, knowledge question answer⁃
ing, and cooperative recommendation.【Innovation/limitation】This paper, from the perspective of multi-source data integration,
builds the LIS academic domain knowledge graph, through which the knowledge representation is more accurate and can effectively
meet the needs of users to accurately acquire domain knowledge. In the next research, the sample data will be expanded and the rela⁃
tionship between entities will be judged through logical reasoning.