情报科学 ›› 2022, Vol. 40 ›› Issue (9): 154-158.

• 博士论坛 • 上一篇    下一篇

图情领域数字人文文献识别与分类方法研究

  

  • 出版日期:2022-09-01 发布日期:2022-10-10

  • Online:2022-09-01 Published:2022-10-10

摘要: 【目的/意义】数字人文作为人文社科和计算机技术的跨界学科,在高速发展的同时面临概念界定不明确、
专题期刊缺乏等问题,增加了文献搜集难度。需要构建适合的识别分类模型,形成数字人文专题文献库,助力数字
人文研究。【方法/过程】分析数字人文学科的内涵,归纳数字人文文献特征,在人工识读标注的基础上构建机器学
习模型,实现对数字人文文献的自动识别与分类。【结果/结论】提出一种基于机器学习的数字人文文献识别分类模
型,对图情领域数字人文文献实现了较好的识别效果。【创新/局限】将机器学习算法应用到数字人文文献分类,较
好应对了词汇复杂和数据量较小的问题;进一步研究可使用深度学习等更复杂的模型,并实现不同领域数字人文
文献的多分类。

Abstract: 【Purpose/significance】As a cross-border discipline,Digital Humanities is developing rapidly while facing the problems of
unclear concept definition and lack of special journals,which increase the difficulty of literature collection. It is necessary to build a
suitable identification and classification model, form a Digital Humanities literature library, and help Digital Humanities research.
【Method/process】Analyze the essential of Digital Humanities,summarize its characteristics,and construct machine learning model on
the basis of manual reading annotation to identify and classify Digital Humanities literature automatically.【Result/conclusion】Raise
a classification model of Digital Humanities literature based on machine learning,which achieves good identification effect for Digital
Humanities literature in the field of Library and Information Science.【Innovation/limitation】Apply machine learning algorithm to Digi?
tal Humanities document classification,which can better deal with the problems of complex vocabulary and small amount of data; Fur?
ther research can use more complex models such as deep learning,and realize the multidisciplinary classification of Digital Humanities
documents in different fields.