ISSN 2079-3537      

 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

Scientific Visualization, 2018, volume 10, number 5, pages 32 - 44, DOI: 10.26583/sv.10.5.03

An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments

Authors: T. Galkin1,A, M. Grigoryeva2,B,D, A. Klimentov3,B,C, T. Korchuganova4,D, I. Milman5,A, V. Pilyugin6,A, M. Titov7,B

A National Research Nuclear University “MEPhI”, Russia

B National Research Center “Kurchatov Institute”, Russia

C Brookhaven National Laboratory, USA

D National Research Tomsk Polytechnic University, Russia

1 ORCID: 0000-0003-2859-6275, TPGalkin@mephi.ru

2 ORCID: 0000-0002-8851-2187, Maria.Grigorieva@cern.ch

3 ORCID: 0000-0003-2748-4829, Alexei.Klimentov@cern.ch

4 ORCID: 0000-0001-5792-8182, Tatiana.Korchuganova@cern.ch

5 ORCID: 0000-0001-9705-9401, Igal.Milman@gmail.com

6 ORCID: 0000-0001-8648-1690, VVPilyugin@mephi.ru

6 ORCID: 0000-0003-2357-7382, Mikhail.Titov@cern.ch

 

Abstract

Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have been collected by unique scientific facilities, such as LHC, RHIC and KEK. As the accelerators are being upgraded with increased energy and luminosity, data volumes are rapidly growing and have reached the exabyte scale. This leads to an increase in the number of data processing and analysis tasks, continuously competing for computational resources. The growing number of processing tasks requires an increase in the capacity of the computing infrastructure that can only be achieved through the use of high-performance computing resources. Along with the grid, these resources form a heterogeneous distributed computing environment (hundreds of distributed computing centers). Given a distributed model of data processing and analysis, the optimization of data and workload management systems becomes a critical task, and the absence of an adequate solution for this task leads to economic, functional and time losses. This paper describes the first stage of a study which aims to solve the task of increasing the stability and efficiency of workflow management systems for mega-science experiments by applying visual analytics methods - data analysis leveraging an interactive GUI. Currently visual analytics methods are widely used in various domains of data analysis, including scientific research, engineering, management, financial monitoring and information security. Using data analysis tools that support data visualization, the information can be analyzed by an individual who is well-informed about the object of investigation, but who is not necessary aware of the internal structure of the data models. Furthermore, visual analytics simplify the navigation through data analysis results: the data is represented by graphical objects, which can be manipulated either by mouse or using touch-sensitive screens. In this case human spatial thinking is actively used to identify new tendencies and patterns in the collected data, without having the users to struggle with underlying software.
In this paper we demonstrate visual methods of clustering computing tasks of the workload management system using the ATLAS experiment at the LHC as an example. The interdependencies and correlations between various tasks or job parameters are investigated and graphically interpreted in an n-dimensional space using 3D projections. The visual analysis allows us to group together similar jobs, identify anomalous jobs, and determine the cause of such anomalies.

 

Keywords: visual analytics, high energy physics, nuclear physics, ATLAS experiment, cluster analysis.