Data Snack: Opportunities in generating synthetic financial data
- Collecting, organizing and sharing data can be a costly and risky process.
- FIs and Big Tech are looking into generating 'synthetic data'. Which mimics real data but removes the risk of privacy breaches occurring due to exposure of Personally Identifiable Information.

While data can be exceptionally useful for analytics and strategizing, mismanaging access to it can lead to significant security risks for both organizations and consumers. Personally Identifiable Information poses a challenge for organizations, who generally want to retain as much detail as they can, without exposing customers to privacy risks.
One solution is synthetically generated data, which mimics real data sets but does not hold any PII. Moreover, synthetic data circumvents the labor and costs attached to data collection and organization, allowing teams to develop algorithms faster and with less red tape.
In the past year companies like Microsoft, Google, and Amazon have all spoken to the importance of synthetic data and its use in their current architecture. San-Diego based startup and synthetic data creator Gretel.ai closed a $50 million Series B funding round in October, led by Anthos Capital. Their products, such as a privacy toolkit, safeguard synthetic data from adversarial attacks and also enables teams to de-bias and anonymize their data sets, while also allowing for the sharing of data among teams more securely.
JP Morgan’s AI research has developed the following model for generating synthetic data sets:
Source: JP Morgan
The flow diagram is explained by JP Morgan as follows:
Step 1: Compute metrics for the real data
Step 2: Develop a Generator (may be statistical methods or an agent-based simulation)
Step 3: (Optional) Calibrate the Generator using the real data
Step 4: Run the Generator to generate synthetic data
Step 5: Compute metrics for the synthetic data
Step 6: Compare the metrics of the real data and synthetic data
Step 7: (Optional) Refine the Generator to improve against comparison metrics
In their research on the subject, JP Morgan found that tabular data in retail banking and time series of market microstructure data are the most in need of protection by financial institutions.
Tune into our Data Day Conference on the 21st of June to find out more about how data is changing the fintech landscape.