The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development

Tshilidzi Marwala, Eleonore Fournier-Tombs and Stinckwich, Serge (2023). The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development. UNU Technology Brief. United Nations University.

Document type:
Report

Metadata
Documents
Versions
Statistics
  • Attached Files (Some files may be inaccessible until you login with your UNU Collections credentials)
    Name Description MIMEType Size Downloads
    UNU-TB_1-2023_The-Use-of-Synthetic-Data-to-Train-AI-Models.pdf English PDF application/pdf 460.02KB
    UNU-TB_1-2023_Use-of-Synthetic-Data-to-Train-AI-Models_CN.pdf Chinese PDF application/pdf 668.65KB
    UNU-TB_1-2023_Use-of-Synthetic-Data-to-Train-AI-Models_JP.pdf Japanese PDF application/pdf 636.00KB
  • Sub-type Policy brief
    Author Tshilidzi Marwala
    Eleonore Fournier-Tombs
    Stinckwich, Serge
    Title The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development
    Series Title UNU Technology Brief
    Volume/Issue No. 1
    Publication Date 2023-09
    Place of Publication Tokyo
    Publisher United Nations University
    Pages 5
    Language eng
    jpn
    Abstract Using synthetic or artificially generated data in training AI algorithms is a burgeoning practice with significant potential. It can address data scarcity, privacy, and bias issues and raise concerns about data quality, security, and ethical implications. This issue is heightened in the global South, where data scarcity is much more severe than in the global North. Synthetic data, therefore, addresses the problem of missing data, leading, in the best case, to better representation of populations in datasets and more equitable outcomes. However, we cannot consider synthetic data to be better or even equivalent to actual data from the physical world. In fact, there are many risks to using synthetic data, including cybersecurity risks, bias propagation, and simply an increase in model error. This policy brief proposes recommendations for the responsible use of synthetic data in AI training and the associated guidelines to regulate the use of synthetic data.
    Copyright Holder United Nations University
    Copyright Year 2023
    Copyright type Creative commons
    ISBN 9789280891454
  • Versions
    Version Filter Type
  • Citation counts
    Google Scholar Search Google Scholar
    Access Statistics: 292 Abstract Views, 999 File Downloads  -  Detailed Statistics
    Created: Mon, 04 Sep 2023, 16:16:47 JST by Powell, Daniel on behalf of UNU Centre