| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hackerlight 1158 days ago

> AI can generate as much synthetic data as we need, on demand.

Doesn't work in majority of domains. You need to know the generating process (e.g. game rules) and build a realistic simulation environment that emulates that, in order to generate data that is useful. Both of these things are out of reach for most applications.

I believe the next large step will be multi-modal, where text is contextualized by video so the LLM will be able to concretize what "sitting on a chair" actually means with a single example, without needing to see thousands of textual associations to infer the meaning from the text.