情报科学 ›› 2021, Vol. 39 ›› Issue (1): 142-147.

• 业务研究 • 上一篇    下一篇

基于语义概念和词共现的微博主题词提取研究 

  

  • 出版日期:2021-01-01 发布日期:2021-01-25

  • Online:2021-01-01 Published:2021-01-25

摘要: 【目的/意义】从海量微博信息中提取准确的主题词,以期为政府和企业进行舆情分析提供有价值的参考。
【方法
/过程】通过分析传统微博主题词提取方法的特点及不足,提出了基于语义概念和词共现的微博主题词提取
方法,该方法利用文本扩充策略将微博从短文本扩充为较长文本,借助于语义词典对微博文本中的词汇进行语义
概念扩展,结合微博文本结构特点分配词汇权重,再综合考虑词汇的共现度来提取微博主题词。【结果
/结论】实验
结果表明本文提出的微博主题词提取算法优于传统方法,它能够有效提高微博主题词提取的性能。【创新
/局限】利
用语义概念结合词共现思想进行微博主题词提取是一种新的探索,由于算法中的分词方法对个别网络新词切分可
能不合适,会对关键词提取准确性造成微小影响。

Abstract: Purpose/significanceExtracting accurate keywords from massive microblog information, in order to provide valuable refer⁃
ence for government and enterprises to analyze public opinion.
Method/processThrough the analysis of the characteristics of tradition⁃
al microblog keywords extraction method and the insufficiency, proposed microblog keywords extraction method based on the semantic
concept and word co-occurrence, the method uses text expansion strategy to expand microblog from short text to long text, by means of
semantic dictionary to do semantic concept extenseion for microblog words, combining with the characteristics of microblog to distrib⁃
ute structure weight of vocabulary, and considering the degree of co-occurrence words to extract microblog keywords.
Result/conclu⁃
sion
The experimental results show that the microblog subject word extraction algorithm proposed in this paper is superior to tradition⁃
al methods. It can effectively improve the performance of microblog subject word extraction.
Innovation/limitationIt is a new explora⁃
tion to use semantic concepts combined with the idea of word co-occurrence to extract microblog subject words. Since the word seg⁃
mentation method in the algorithm may not be appropriate for the segmentation of some new network words, there is a slight impact on
the accuracy of keyword extraction.