Hacker News new | ask | show | jobs
by pgao 1964 days ago
Former self-driving engineer here. I'm also pretty skeptical about synthetic data. For the scenario you described, it turns out that if you drive enough, you'll eventually see some examples of ambulances at night in the rain. If it's really that rare, it's often easier to rent your own ambulance, drive around and do some staged data collection, and annotate the results than it is to set up a synthetic data pipeline.

At the end of the day, even in vision applications, real data is always better than synthetic data if you can get it. Things like sensor noise or interference are hard to replicate in synthetic data. Most teams turn to synthetic data for simulation purposes or as a last resort.