Differentially Private Synthetic Data
This project was completed as a part of the Data Privacy course at UVM during the Fall of 2018. It investigates the generation of a synthetic dataset from a real one, with measureable differental privacy constraints applied. This implementation uses differentially private fast correlation to learn a directed acylic dependency graph between the different variables in the original dataset. Next, differentially private conditional marginals are generated according to the structure of the directed acyclic graph, and then lastly, synthetic samples can be generated by sampling the learned conditional marginal distrubtions. A slide-deck containing my presentation can be found below.
This project was designed based onthe constraints and the dataset provided through the NIST 2018 differential privacy challange
As a seperate project a few years later, I also investiagted the applicability of generating a synthetic version of the ABCD dataset with the bioinformatics working group of ABCD. In this follow-up project I tried a number of open-source implementations for generating synthetic data and then performed ML based expiriments on them. A slide-deck which I presented to this group is included below.