Analysis of Data Independence Using Nearest Neighbor Graphs

Kislitsyn, A.A.

doi:10.26583/sv.17.5.05

Научная визуализация, 2025, том 17, номер 5, страницы 38 - 51, DOI: 10.26583/sv.17.5.05

Analysis of Data Independence Using Nearest Neighbor Graphs

Автор: А.А. Кислицын¹

Keldysh Institute of Applied Mathematics RAS

¹ ORCID: 0000-0003-2388-0496, kislitsyn@kiam.ru

Аннотация

The article describes a new method for testing the independence of random data sets. This method uses representation of connections between data points in the form of nearest neighbor graphs and compares parameters of the resulting specific graph — such as the number of connected components and vertex degree distribution — to numerically derived critical statistics for random nearest neighbor graphs obtained by the author. The proposed method can be applied to various practical situations. It can be used to test the independence of random vectors in low-dimensional metric spaces. Such problems arise when analyzing data from physical measurements. Additionally, this approach is applicable for analyzing random points in high-dimensional spaces where direct numerical evaluations require exhaustive enumeration and are therefore impractical or impossible to obtain exactly. This problem relates to object classification characterized by many parameters. Moreover, proximity function between points may not necessarily be symmetric, which allows application of graph methods even in such cases. Alongside the task of sample testing, one could also consider comparing pseudo-random number generators by benchmarking structural graph statistics based on them. Considered probabilities of graph structure realization provide an independent set of criteria. For example, the number of fragments in some random graph might be typical for independent random variables while vertex degree distributions could differ significantly. This extends the applicability domain of statistical analysis. The paper presents a collection of model examples illustrating how the methodology works with respect to several types of practical scenarios mentioned above. A comparison of this method with other statistical approaches is provided. We emphasize that using graphs as visualization tools enables immediate identification of dependent elements within samples if there exist cluster centers represented by vertices with anomalously large degrees.

Ключевые слова: nearest neighbor graphs, statistics, data independence, PRNG, pi-number.

Научная визуализация

Электронный журнал открытого доступа

Национальный Исследовательский Ядерный Университет "МИФИ"

В данный момент текст статьи на русском языке недоступен.