X

Nvidia Reportedly Bought a Synthetic Data Firm. So What's Synthetic Data?

Generative AI models must be trained on a lot of information, and sometimes you have to make it up.

Headshot of Jon Reed
Headshot of Jon Reed
Jon Reed Managing Editor
Jon covers artificial intelligence. He previously led CNET's home energy and utilities category, with a focus on energy-saving advice, thermostats, and heating and cooling. Jon has more than a decade of experience writing and reporting, including as a statehouse reporter in Columbus, Ohio, a crime reporter in Birmingham, Alabama, and as a mortgage and housing market editor for Time's former personal finance brand, NextAdvisor. When he's not asking people questions, he can usually be found half asleep trying to read a long history book while surrounded by multiple cats. You can reach him at joreed@cnet.com
Expertise Artificial intelligence, home energy, heating and cooling, home technology.
Jon Reed
2 min read
Jensen Huang, a man in a black jacket with gray hair, speaks on a stage in front of a screen that is green with the Nvidia GTC logo in white.

Nvidia CEO Jensen Huang speaks during the company's 2025 GPU Technology Conference, or GTC, in San Jose, California. 

David Paul Morris/Bloomberg via Getty Images

Chipmaker Nvidia is leaning further into producing tools for generative AI developers with the acquisition of synthetic data firm Gretel for more than $320 million, according to a report from Wired on Wednesday.

The move comes as generative AI firms struggle to find enough data to train and improve their models, increasing the need to generate data. 

According to the report, Gretel's employees will be folded into Nvidia. Gretel, which produces synthetic or simulated data for AI model training, will bolster Nvidia's offerings for AI developers.

An Nvidia spokesperson declined to comment on the report.

Watch this: Watch Nvidia's GTC 2025 Keynote: All the Highlights in 16 Minutes

Why synthetic data matters

Training generative AI models like OpenAI's ChatGPT, a large language model, requires a lot of data. Real-world data can pose problems for AI developers -- namely, it can be noisy, and there isn't enough.

AI firms are running up against the limit of training data that is freely available to them, leading to conflicts over whether they can use copyrighted content. Hundreds of actors, writers and directors submitted an open letter to the Trump administration's Office of Science and Technology Policy to raise their concerns about the use of copyrighted data. Currently, OpenAI is petitioning the government to allow greater access to copyrighted material to train AI models, or else American companies will be left behind by China.

Synthetic data also has value in protecting private information. Gretel says its synthetic data can be used to train models and tools without exposing sensitive or personal information -- for example, health care data that doesn't identify individual people and potentially violates privacy laws.

There are concerns about using such data in model training. An overreliance on information that isn't rooted in reality can increase the likelihood that a model will get things wrong. If the problem gets bad enough, it can cause a problem known as model collapse, when the model becomes so inaccurate that it becomes useless.