情报科学 ›› 2022, Vol. 40 ›› Issue (10): 3-11.

• 专论 •    下一篇

基于词共现与词向量的概念层次关系自动抽取模型 ——以学术论文评价领域为例 

  

  • 出版日期:2022-10-01 发布日期:2022-10-01

  • Online:2022-10-01 Published:2022-10-01

摘要: 【目的/意义】通过概念层次关系自动抽取可以快速地在大数据集上进行细粒度的概念语义层次自动划分,
为后续领域本体的精细化构建提供参考。【方法
/过程】首先,在由复合术语和关键词组成的术语集上,通过词频、篇
章频率和语义相似度进行筛选,得到学术论文评价领域概念集;其次,考虑概念共现关系和上下文语义信息,前者
用文献
-概念矩阵和概念共现矩阵表达,后者用word2vec词向量表示,通过余弦相似度进行集成,得到概念相似度
矩阵;最后,以关联度最大的概念为聚类中心,利用谱聚类对相似度矩阵进行聚类,得到学术论文评价领域概念层
次体系。【结果
/结论】经实验验证,本研究提出的模型有较高的准确率,构建的领域概念层次结构合理。【创新/局限】
本文提出了一种基于词共现与词向量的概念层次关系自动抽取模型,可以实现概念层次关系的自动抽取,但类标
签确定的方法比较简单,可以进一步探究。

Abstract: Purpose/significanceThrough the automatic extraction of concept hierarchies,a fine-grained domain semantic hierarchi‐cal system can be quickly and automatically obtained on a large data set,which provides a reference for the subsequent refined con‐struction of domain ontology.Method/processFirstly,based on the term set composed of compound terms and keywords,filtered by word frequency,text frequency and semantic similarity,the concepts of academic paper evaluation are obtained. Secondly,considering the co-occurrence relationship expressed by the paper-concept matrix and the concept co-occurrence matrix and the contextual se‐mantic information expressed by the word2vec word vector,the concept similarity matrix is obtained through similarity integration. Fi‐nally,based on concept similarity matrix,taking the concept with the largest correlation as the clustering center,the concept hierarchies
of academic paper evaluation are obtained through spectral clustering.
Result/conclusionExperiment proves that the accuracy rate of the model proposed in this study is good,and the constructed academic papers evaluation domain hierarchical structure is reason‐able. Innovation/limitationThis paper proposes an automatic extraction model of concept hierarchies based on word co-occurrence and word vector, which can automatically extract the hierarchies. However, the method for determining the class label is relatively simple and can be further explored.