ISSN 2079-3537      

 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             





Scientific Visualization, 2020, volume 12, number 4, pages 9 - 22, DOI: 10.26583/sv.12.4.02

Visual Analysis of Text Data Volume by Frequencies of Joint Use of Nouns and Adjectives

Authors: A.E.  Bondarev1,A, A.V. Bondarenko2,B, V.A. Galaktionov3,A

A Keldysh Institute of Applied Mathematics RAS

B State Res. Institute of Aviation Systems (GosNIIAS)

1 ORCID: 0000−0003−3681−5212, bond@keldysh.ru

2 ORCID: 0000-0003-4765-6034, cod@fgosniias.ru

3 ORCID: 0000-0001-6460-7539, vlgal@gin.keldysh.ru

 

Abstract

The presented research is devoted to the problems of studying the cluster structure of multidimensional data volumes. This paper presents the results of numerical experiments on the study of data volumes consisting of frequencies of joint use of adjectives and nouns. The volumes of data were obtained from samples from text collections in Russian. The aim of the research is to analyze the cluster structure of the studied volume and semantic proximity of words in clusters and subclusters. The hypothesis was used that words with similar meaning should occur in approximately the same context. In this regard, in the space of features, they will be at a relatively close distance from each other, while differing words will be at a more distant distance from each other. Research is carried out using elastic maps, which are effective tools for visual analysis of multidimensional data. The construction of elastic maps and their extensions in the space of the first three principal components makes it possible to determine the cluster structure of the studied multidimensional data volumes. The analysis of the cluster structure for the considered volume of multidimensional data is carried out. The influence of transposition of the initial data array is considered. Such analysis can be useful in the tasks of confronting negative verbal influences such as fake news, hidden propaganda, involvement in sects, verbal manipulation, etc.

 

Keywords: Multidimensional Data, Visual Analysis, Elastic Maps, Frequencies of Joint Use, Cluster Structures.