Synthetic Data: The Complete Series

Author

Caroline Morton

Date

April 9, 2026

I have spent the last year thinking and writing about synthetic data. The promise is straightforward: generate data that looks and behaves like the real thing, without exposing anything confidential. I write from the perspective of someone working in epidemiology and healthcare research, but the principles extend well beyond this area. Synthetic data is being used across finance, pharma, university research centres, and large corporate enterprises. What I have found drives adoption in almost every case is the same tension: organisations want more data to work with, but the real data is sensitive. In healthcare and finance that sensitivity is about protecting individuals from being identified, after all what could be more sensitive than your health record or bank statements. In commercial settings it is about protecting data that is valuable to competitors.

The posts are listed in sections and represent a sensible reading order, especially if you are new to the topic. Each post stands alone and can be read independently. The series is ongoing, and I will update this page as new posts are published.

Foundations

We start with an introduction to synthetic data, some of the main methods of generation at a high level, and the applications that are driving the demand for synthetic data.

Methods of generation

These posts deal with the methods of generating synthetic data, where each post describes a different method, the advantages and disadvantages of the method, and its potential usecase and application.

Applications

Here we cover the real-world use cases for synthetic data.

Evaluation

In these posts we cover some of the evaluation metrics that are important in synthetic data generation with regards to privacy, representativeness and utility.

Get in touch

This series is ongoing. If you work with synthetic data or are considering it for your research, I would be glad to hear from you.

Know someone who'd like this?

Enjoyed this? Subscribe to my newsletter.

I write about open science, research code, and building better tools for researchers.

Browse the newsletter archive →

Related Posts

star green

Code Review for Research Code

An overview of how to conduct a code review for research code

Read More
padlock blue

How Synthetic Data Is Used in Healthcare, Research and Beyond

Explore real-world use cases for synthetic data in healthcare, clinical trials, finance and more.

Read More
errors green

Your Errors Are Data Too

How Rust's error handling patterns let you treat errors as structured observations about your data - capturing context, categorising failures, and producing data quality reports as first-class pipeline outputs.

Read More