Differential privacy framework for generating synthetic fMRI data with generative adversarial networks
Daafane, Hiba (2024-12-22)
Differential privacy framework for generating synthetic fMRI data with generative adversarial networks
Daafane, Hiba
(22.12.2024)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe202501133248
https://urn.fi/URN:NBN:fi-fe202501133248
Tiivistelmä
Neuroimaging, particularly functional magnetic resonance imaging (fMRI), is one of the most significant tools of medical and cognitive research, as access to high quality neuroimaging datasets provides unparalleled insights into brain activity and functions. Its combination with artificial intelligence applications has increased its scope even further, bringing into view advanced diagnostic tools and predictive models for neurological and psychological disorders. However, the sensitive nature of such data, coupled with strict privacy regulations, significantly limits its accessibility and hinders collaborative research efforts.
Synthetic data has emerged as a valuable tool for generating artificial datasets that replicate the statistical properties of sensitive datasets with a reduced risk of privacy breaches. This thesis takes this as a starting point and builds upon the work done by Zheng et al., which utilized Generative Adversarial Networks (GANs) to generate synthetic task-conditioned fMRI images, and integrated Differential Privacy (DP) into the technical approach as a means to introduce a quantifiable measure of privacy all while preserving utility for downstream machine learning tasks. In this work, DP was integrated into the ICW-fMRI-GAN, using the Opacus privacy engine, and the research investigated three core challenges: the impact of DP on synthetic data sample quality evaluated through the Inception Score (IS), its effect on model performance and classification accuracy in predicting cognitive tasks from the images, and the degree of privacy protection ensured.
The experiments conducted in this work involve two medical institutions of varying sizes and resources, where a DP-wise access protocol is proposed as a potential solution for effective data sharing and research collaboration. The results demonstrated that the use of a combination of real and DP synthetic data achieves a competitive level of predictive accuracy while offering a fair amount of privacy guarantees. The work also underscores the need for future research to refine DP mechanisms for high dimensional data, such as brain images, and to develop synthetic datasets that are capable of maintaining sufficient utility while preserving patients privacy.
Synthetic data has emerged as a valuable tool for generating artificial datasets that replicate the statistical properties of sensitive datasets with a reduced risk of privacy breaches. This thesis takes this as a starting point and builds upon the work done by Zheng et al., which utilized Generative Adversarial Networks (GANs) to generate synthetic task-conditioned fMRI images, and integrated Differential Privacy (DP) into the technical approach as a means to introduce a quantifiable measure of privacy all while preserving utility for downstream machine learning tasks. In this work, DP was integrated into the ICW-fMRI-GAN, using the Opacus privacy engine, and the research investigated three core challenges: the impact of DP on synthetic data sample quality evaluated through the Inception Score (IS), its effect on model performance and classification accuracy in predicting cognitive tasks from the images, and the degree of privacy protection ensured.
The experiments conducted in this work involve two medical institutions of varying sizes and resources, where a DP-wise access protocol is proposed as a potential solution for effective data sharing and research collaboration. The results demonstrated that the use of a combination of real and DP synthetic data achieves a competitive level of predictive accuracy while offering a fair amount of privacy guarantees. The work also underscores the need for future research to refine DP mechanisms for high dimensional data, such as brain images, and to develop synthetic datasets that are capable of maintaining sufficient utility while preserving patients privacy.