ISSN 2079-3537      

 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
Scientific Visualization
Issue Year: 2015
Quarter: 2
Volume: 7
Number: 2
Pages: 50 - 72
Article Name: GRAPHICAL APPROACH TO THE PROBLEM OF FINDING SIMILAR TEXTS
Authors: V.L. Yevseyev (Russian Federation), G.G. Novikov (Russian Federation)
Address: V.L. Yevseyev
VLEvseev@mephi.ru
National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russian Federation

G.G. Novikov
GGNovikov@mephi.ru
National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russian Federation
Abstract: This work is devoted to one of the possible approaches to the development of a system comparing texts encountered in solving the problem of comparing texts regulatory developments.
The proposed method allows to find regulatory documents, fragments of which are similar to the query entered by the user. The analysis of the state of developments in the field of fuzzy full-text search, which implies that it is possible to allocate the following search methods implemented in a retrieval system of different types: information retrieval systems and data retrieval.
For solving the problem of text search by similarity, introduced the concept of "comparison", which boils down to finding documents in information retrieval database, which to some extent is similar to the original search pattern. The basis of the proposed method based on a assumption that the similarity of documents is determined by the closeness of their images in the form of texts without performing semantic study of their content. As a result, the dimension of the tasks is reduced by several orders of magnitude and enters the realm of the possible for a public computer equipment (personal computers). Developed the overall structure of the search algorithm. To solve the problem of searching for similar texts built search dictionary, podlogar and the domain forming chains in the problem of fuzzy full-text search. This allowed to get a set of domain forming chains of words suitable to assess the relevance of the investigated text search template.
Language: English


Open Article
 
Open Article
in Russian translation
   Download ZIP archive
 
Download ZIP archive
in Russian translation