Nvidia Reportedly Bought a Synthetic Data Firm. So What's Synthetic Data?

Chipmaker Nvidia is leaning further into producing tools for generative AI developers with the acquisition of synthetic data firm Gretel for more than $320 million, according to a report from Wired on Wednesday.

The move comes as generative AI firms struggle to find enough data to train and improve their models, increasing the need to generate data.

According to the report, Gretel's employees will be folded into Nvidia. Gretel, which produces synthetic or simulated data for AI model training, will bolster Nvidia's offerings for AI developers.

An Nvidia spokesperson declined to comment on the report.

Watch this: Watch Nvidia's GTC 2025 Keynote: All the Highlights in 16 Minutes

16:26

Why synthetic data matters

Training generative AI models like OpenAI's ChatGPT, a large language model, requires a lot of data. Real-world data can pose problems for AI developers -- namely, it can be noisy, and there isn't enough.

AI firms are running up against the limit of training data that is freely available to them, leading to conflicts over whether they can use copyrighted content. Hundreds of actors, writers and directors submitted an open letter to the Trump administration's Office of Science and Technology Policy to raise their concerns about the use of copyrighted data. Currently, OpenAI is petitioning the government to allow greater access to copyrighted material to train AI models, or else American companies will be left behind by China.

Synthetic data also has value in protecting private information. Gretel says its synthetic data can be used to train models and tools without exposing sensitive or personal information -- for example, health care data that doesn't identify individual people and potentially violates privacy laws.

There are concerns about using such data in model training. An overreliance on information that isn't rooted in reality can increase the likelihood that a model will get things wrong. If the problem gets bad enough, it can cause a problem known as model collapse, when the model becomes so inaccurate that it becomes useless.