Implementation at UCLH using sqlsynthgen

SQLSYNTHGEN (SSG), a portmanteau of ‘SQL Synthetic Generator’, is a software tool designed by a team (May Yong, Ian Stenson, and Markus Hauru) working under Professor Carsten Maple at the Alan Turing Institute.

SSG creates synthetic replicas of relational (SQL) databases. By default, whilst it reliably replicates the structure of the database, it will only populate that structure with random data using another tool called Mimesis.

Fake data refers to data that is not useful or sensitive, but is used to occupy a space where real data is typically located.

We believe that there are two key features of SSG that crucial to our work.

  1. The decision making is transparent. A single file written in human readable code is used to configure the tool, and document where a user decides to improve the fidelity of the synthetic data by learning from the source data.

  2. Statistical Disclosure Control decisions are part of the workflow using an approach called differential privacy which provably defends against re-identification attacks.[^5]