Search for Extended Repeats in Genomes Based on the Spectral-Analytical Method
Pankratov A., Pyatkov M., Tetuev R., Nazipova N., Dedus F.F.
Institite of Mathematical Problems of Biology of the Russian Academy of Sciences
Moscow State University
Abstract. The spectral-analytical approach to identify diverged extended repeats in genomic sequences presented. The method is based on the multi-scaled integral estimation of the similarity of nucleotide sequences in the space of coefficients of expansion of the curves of GC-and GA-content using classical orthogonal bases. Conditions are found for the optimal approximation, providing automatic detection of different types of repeats (direct and inverted and tandem) for the spectral matrix of similarity. The method works equally well on different scales of data. It can detect fragments of segmental duplications, megasatellite blocks in the genome,as well the regions of synteny. It can be used for a detailed study of chromosome fragments (search for diverged fragments with a moderate length of the repeat unit).
Key words: comparison of genomes, the approximation, the matrix of similarity, pattern recognition, megasatellites, interspersed repeats.