ORCID
Eman Sayed: https://orcid.org/0000-0001-6121-8458
Sara M. Mosaad: https://orcid.org/0009-0003-4597-6445
Keywords
Generative artificial intelligence, Medical data augmentation, Electronic health records, Clinical narratives, Diffusion models, Privacy-preserving synthetic data
Article Type
Original Article
Abstract
Generative Artificial Intelligence has emerged as a transformative tool in healthcare by addressing persistent limitations in medical data availability, including annotation scarcity, privacy restrictions, and demographic underrepresentation. Instead of relying solely on real-world samples, generative models synthesize clinically realistic data that support model robustness, simulate rare conditions, and improve representation across patient groups. This review presents a comprehensive synthesis of four widely adopted generative model classes: Generative Adversarial Networks, Variational Autoencoders, Diffusion Models, and Transformer-based architectures. These models are examined across four key clinical data categories, including radiological imaging, structured electronic health records, physiological time series, and free-text clinical narratives. The review organizes the literature using a two-dimensional framework that integrates model architecture with data modality, offering a structured understanding of how modality-specific characteristics such as spatial detail, temporal structure, semantic richness, and privacy sensitivity influence generative model design and application. In addition to technical evaluation, the review identifies several unresolved challenges, including the lack of standardized fidelity metrics, limited integration of clinical validation, and underdeveloped practices for fairness auditing and privacy preservation. The ethical and regulatory context also remains fragmented, particularly in high-risk clinical applications where synthetic data may affect safety, accountability, and patient outcomes. This work contributes a unified framework that connects technical advances with practical, ethical, and policy concerns. By synthesizing empirical findings from the literature, exposing methodological gaps, and offering guidance for responsible implementation, this review supports future research and development of generative models that are equitable, clinically valid, and aligned with healthcare priorities.
How to Cite
Sayed, Eman and Mosaad, Sara M.
(2025)
"Generative AI for Medical Data Augmentation: A Two-Dimensional Review of Techniques, Modalities, and Governance Challenges,"
Sustainable Machine Intelligence Journal: Vol. 12:
Iss.
1, Article 4.
Available at:
https://smij.sciencesforce.com/journal/vol12/iss1/4
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.