情报科学 ›› 2024, Vol. 42 ›› Issue (2): 165-173.

• 博士论坛 • 上一篇    下一篇

分类分析结果的数据故事化描述模型研究

  

  • 出版日期:2024-02-05 发布日期:2024-06-07

  • Online:2024-02-05 Published:2024-06-07

摘要:

【目的/意义】分类分析已经在实践生活中应用广泛,构建分类分析结果的数据故事化描述模型,旨在明确
数据故事化描述的实现方式,发挥数据故事的解释性功能。【方法/过程】本研究综合应用文献调研法、模型构建法,
按照“问题提出-需求识别-内容提取-模型设计”的路径展开研究,并采用BPMN方法设计数据故事化描述流程,
接着在此基础上逐渐深入,探究数据故事化描述模型构建的内容模块和关键活动,最终形成数据故事化描述模型。
【结果/结论】本研究提炼出数据故事化描述的故事情境、数据的描述统计和分析结果的解释需求与内容,并拆分出
包含情境模块、描述统计模块和解释模块的数据故事内容模块以及由人物、事件和情节组织起来的描述模型。最
后在UCI Breast-Cancer数据集上构建模型实例,实现本研究分类分析结果的数据故事化描述模型的实际应用。【创
新/局限】本研究提出了面向分类分析结果的数据故事化描述模型,对数据故事化描述的实现具有理论指导意义,
后续要结合其它类型分析结果的特征加强此描述模型的泛化性,并基于此突破数据故事自动生成任务。

Abstract:

【Purpose/significance】 Classification analysis has been widely used in social life. Constructing a data storytelling descrip⁃
tion model for classification analysis results is aiming at generating the data story and exerting the explanatory function of data sto⁃
ries.【Method/process】 In this study, literature research method and model construction method are comprehensively applied, and the
research is carried out according to the path of “question raising - requirement identification - content extraction - model design”
.
BPMN method is used to design the process of data story-based description. Then it explores the content modules and key activities of
data story-based description model construction. Finally, the data story description model is formed.【Result/conclusion】 In this
study, the needs and contents of data storytelling description include: story context, descriptive statistics of data and interpretation of
classification analysis results. The data story content module including the story context, the descriptive statistics and the explanation
is extracted. And the descriptive model containing characters, events, and plots are constructed. Finally, the model was constructed on
UCI Breast-Cancer data set to realize the practical application of this study.【Innovation/limitation】This study proposes a data story⁃
telling description model for classification analysis results. In the future, the generalization of this description model will be enhanced
by combining the characteristics of other types of analysis results, and the task of automatically generating data stories will be re⁃
searched.