|
|
|
|
|
by agnosticmantis
934 days ago
|
|
He’s said it’s easy to generate this kind of questions that trick LLMs because of the lack of physical grounding in models trained solely on text. And that’s true now as ever. I also heard him say that training multimodal models on text+image/video would mitigate the grounding issue, and that’s proven to be true too. So I’m not sure exactly what your objection is. |
|