Generative AI for Medical Data Augmentation: A Two-Dimensional Review of Techniques, Modalities, and Governance Challenges

Eman Sayed, Department of Decision Support, Faculty of Computers and Informatics, Zagazig University, Harayah Raznah, Zagazig, Sharqiyah 44519, EgyptFollow
Sara M. Mosaad, Information System Department, Faculty of Commerce and Business, Helwan University, Ain Helwan, Cairo 11795, EgyptFollow

ORCID

Eman Sayed: https://orcid.org/0000-0001-6121-8458

Sara M. Mosaad: https://orcid.org/0009-0003-4597-6445

Keywords

Generative artificial intelligence, Medical data augmentation, Electronic health records, Clinical narratives, Diffusion models, Privacy-preserving synthetic data

Article Type

Original Article

Abstract

Generative Artificial Intelligence has emerged as a transformative tool in healthcare by addressing persistent limitations in medical data availability, including annotation scarcity, privacy restrictions, and demographic underrepresentation. Instead of relying solely on real-world samples, generative models synthesize clinically realistic data that support model robustness, simulate rare conditions, and improve representation across patient groups. This review presents a comprehensive synthesis of four widely adopted generative model classes: Generative Adversarial Networks, Variational Autoencoders, Diffusion Models, and Transformer-based architectures. These models are examined across four key clinical data categories, including radiological imaging, structured electronic health records, physiological time series, and free-text clinical narratives. The review organizes the literature using a two-dimensional framework that integrates model architecture with data modality, offering a structured understanding of how modality-specific characteristics such as spatial detail, temporal structure, semantic richness, and privacy sensitivity influence generative model design and application. In addition to technical evaluation, the review identifies several unresolved challenges, including the lack of standardized fidelity metrics, limited integration of clinical validation, and underdeveloped practices for fairness auditing and privacy preservation. The ethical and regulatory context also remains fragmented, particularly in high-risk clinical applications where synthetic data may affect safety, accountability, and patient outcomes. This work contributes a unified framework that connects technical advances with practical, ethical, and policy concerns. By synthesizing empirical findings from the literature, exposing methodological gaps, and offering guidance for responsible implementation, this review supports future research and development of generative models that are equitable, clinically valid, and aligned with healthcare priorities.

How to Cite

Sayed, Eman and Mosaad, Sara M. (2025) "Generative AI for Medical Data Augmentation: A Two-Dimensional Review of Techniques, Modalities, and Governance Challenges," Sustainable Machine Intelligence Journal: Vol. 12: Iss. 1, Article 4.
DOI: https://doi.org/10.63689/3005-3617.1066