Russian version English version
Volume 21   Issue 1   Year 2026
From Marginals to Microdata: A cGAN-based Projection for Population Synthesis

Konstantin Novikov1, Nina Gorodnova2, Alexei Romanyukha1

1Marchuk Institute of Numerical Mathematics, Russian Academy of Sciences, Moscow, Russia
2MerlionTech LLC, Krasnogorsk, Moscow, Russia

Abstract. We present a novel framework for synthetic population generation using conditional Generative Adversarial Networks (cGANs) to infer individual-level health microdata from population-level marginals. Specifically, our approach synthesizes realistic microdata for target populations where only aggregate statistics are available, such as age-sex distributions and disease prevalence. To address the challenges of data scarcity and privacy constraints, we develop a cGAN architecture that captures and transfers complex intervariable relationships (e.g., age-disease and disease-disease dependencies) from a source population with microdata to a target population described by marginals. A hybrid loss function enforces the fidelity to the target marginals while preserving the epidemiological realism of the generated samples. We evaluated our method using the 2023 Behavioral Risk Factor Surveillance System (BRFSS) dataset from the USA, demonstrating strong alignment with real-world distributions across multiple states. The model accurately replicates disease co-occurrence patterns and age-disease correlations, even though these were not part of the conditioning data. Our results suggest that this method can enable scalable and privacy-preserving synthetic data generation, with promising applications in public health modeling and agent-based simulation, particularly in regions lacking detailed individual-level data.

 

Key words: synthetic population, deep learning, cGAN, microdata, individual-level database

Table of Contents Original Article
Konstantin Novikov, Nina Gorodnova, Alexei Romanyukha From Marginals to Microdata: A cGAN-based Projection for Population Synthesis. Ìàthematical biology and bioinformatics. 2026;21(1):187-199. doi: 10.17537/2026.21.187
(published in English)

Abstract (eng.)
Abstract (rus.)
Full text (eng., pdf)
References

 

  Copyright IMPB RAS © 2005-2026