Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts

E.N. Penskaya, L.V. Khachaturian
80,00 ₽

UDC 82(0.032):004

DOI 10.20339/PhS.5-23.160     


Penskaya Elena N.,

Doctor of Philology, Professor

National Research University “Higher School of Economics;

Head of the Group of the Center for Interdisciplinary Research

Moscow Institute of Physics and Technology

ORCID: 0000-0003-2469-584X


Khachaturian Lyubov V.,

Candidate of Culturology, Associate Professor

National Research University “Higher School of Economics”;

Senior Scientific Researcher of the Center for Interdisciplinary Research

Moscow Institute of Physics and Technology

ORCID: 0000-0002-2689-5186



The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.

Keywords: manuscript heritage, digital form, bitmap image, new methods, manuscript architectonics, big data, data mining.



