情报科学

• • 上一篇    下一篇

5G环境下高校图书馆自媒体平台多标签文本分类方法研究

  

  1. 1湖北理工学院图书馆,湖北 黄石 435003;2湖北工业大学经济与管理学院,湖北 武汉 430070
    ;3武汉科技大学图书馆,湖北 武汉 430080;4湖北中医药大学图书馆,湖北 武汉 430070

Research on Multi-label Text Classification Method of University Library's Self-Media Platform in 5G Environment

  1. 1 Library of Hubei Institute of Technology, Huangshi, Hubei, 435003; 2 School of Economics and Management, Hubei University of Technology, Wuhan, Hubei 430070, China; 3 Library of Wuhan University of Science and Technology, Wuhan, Hubei 430080; 4 Library of Hubei University of Chinese Medicine, Wuhan, Hubei 430070

摘要:

[目的/意义]由于自媒体平台中的多标签文本具有高维性和不平衡性,导致文本分类效果较差,因此通过研究5G环境下高校图书馆自媒体平台多标签文本分类方法对解决该问题具有重要意义。[方法/过程]本文首先通过对采集的5G环境下高校图书馆自媒体平台多标签文本进行预处理,包括无意义数据去除、文本分词以及去停用词等;然后采用改进主成分分析方法进行多标签文本降维处理,利用向量空间模型实现文本平衡化处理;最后以处理后的文本为基础,采用AdaboostSVM两种算法构建文本分类器,实现多标签文本分类。[结果/结论]实验结果表明,本文拟定的自媒体平台标签文本分类方法可以使汉明损失降低,F1值提高,多标签文本分类效果好,且耗时较低,具有可靠性。[创新/局限]由于本研究中的数据集数量不够多,所以在测试和验证方面,得出的结果具有一定局限性。因此在未来研究中期望利用更为丰富的数据库,对所设计的方法做出进一步的进行改进与创新。

关键词:

5G高校图书馆, 自媒体平台, 多标签文本, 分类, 降维, 平衡化处理

Abstract:

[Purpose/Meaning] Due to the high dimension and imbalance of multi label text in we media platform, the text classification effect is poor. Therefore, it is of great significance to study the multi label text classification method of University Library's we media platform in 5G environment. [Method/Process] Firstly, this paper preprocesses the multi label text of University Library's we media platform in 5G environment, including meaningless data removal, text segmentation, and stop words removal; Then use the improved principal component analysis method for multi-label text dimensionality reduction, and use the vector space model to achieve text balance processing; Finally, based on the processed text, a text classifier is constructed using Adaboost and SVM algorithms to achieve multi-label text classification. [Results/Conclusions] The experimental results show that the self-media platform label text classification method proposed in this paper can reduce the Hamming loss, increase the F1 value, and have a good multi-label text classification effect, and it is less time-consuming and reliable. [Innovation/Limitations] Due to the insufficient number of data sets in this study, the results obtained in terms of testing and verification have certain limitations. Therefore, in future research, we expect to use a richer database to make further improvements and innovations to the designed method.

Key words:

5G university library, self media platform, multi label text, classification, dimension reduction, balanced processing