Hacker News new | ask | show | jobs
by dougmwne 841 days ago
I don’t think we are at a plateau. We may have fed a large amount of text into these models, but when you add up all other kinds of media, images, videos, sound, 3D models, there’s a castle more rich dataset about the world. Sora showed that these models can learn a lot about physics and cause and effect just from video feeds. Once this is all combined together into multimodal mega models then we may be closer to the plateau.