Russian version English version
Volume 20   Issue 2   Year 2025
A new type of tandem repeats in the genome of Mus musculus: GC-rich megasatellites

Nafisa Nazipova, Ruslan Tetuev

Institute of Mathematical Problems of Biology RAS, Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia

Abstract. With the advance of the third-generation sequencing (TGS) technologies, the technical difficulties in sequencing the repetitive regions of eukaryotic genomes were overcome and, as a result, T2T assemblies of a number of model genomes, including that of the house mouse Mus musculus, were completed. This technological breakthrough opened up new possibilities for investigating the structural and functional organization of complex genomes. In our studies, we verified the tandem repeats that were previously identified in the mouse genome and discovered new regions of periodicity, which we called GC-rich megasatellites. This paper describes two megasatellites found in the intergenic regions of chromosome 6 of the mouse genome; each of them has its own characteristics. These extended tandem repeats have an average GC content of 56 % and contain multiple micro- and minisatellite tracks, covering more than a third of each megasatellite copy. In total, there are 83 copies of the first and 71 copies of the second repeat regions, with their average lengths being 4320 and 2440 bp, respectively. A characteristic feature of both tandem repeats is the presence of extended conserved regions alternated with islands of variability (micro- and minisatellite tracks). In the first megasatellite, the islands of variability are formed by the tracks (ACCCC)n, (AAACG)n, (AAAC)n, (GTCT)n, (GT)n, (CTTC)n, (GAGAAG)n and (TCC)n; in the second one there are three islands of variability formed by homopurine-homopyrimidine tracks (CT)n, (CA)n and (CT)n. The first megasatellite contains the coding sequence of a proline-rich protein at the 3'-end of each copy; its tandem copies are heterogeneous in the GC-content. Namely, 10% of the pattern at the 5' end and 20% at the 3' end contain more than 65% of GC, whereas the inner 70% of the pattern has ~50% GC. The second megasatellite is more GC-rich than the first one. At the same time, its copies are more uniform in length, GC content, and oligonucleotide-track coverage.

Key words: TGS, Mus musculus T2T genome, chromosome 6, extended tandem repeats, GC-rich megasatellites, microsatellite tracks, minisatellite tracks

 
Table of Contents Original Article
Nafisa Nazipova, Ruslan Tetuev A new type of tandem repeats in the genome of Mus musculus: GC-rich megasatellites. Ìàthematical biology and bioinformatics. 2025;20(2):416-436. doi: 10.17537/2025.20.416
(published in Russian)

Abstract (rus.)
Abstract (eng.)
Full text (rus., pdf)
References
Supplementary data Translation into English
Nafisa Nazipova, Ruslan Tetuev A new type of tandem repeats in the genome of Mus musculus: GC-rich megasatellites. Ìàthematical biology and bioinformatics. 2025, 20(Suppl.):t44-t62. doi: 10.17537/2025.20.t44

Full text (eng., pdf)

 

  Copyright IMPB RAS © 2005-2026