Hacker News new | ask | show | jobs
by visarga 1458 days ago
It won't be just a text and image generation model, the train ride doesn't stop. They are also learning how to act.

Soon the GPT's will learn video, which includes the video itself, the audio and the subtitles. There are billions of hours of video content on YouTube, this new modality will make it easier to learn the procedural knowledge (how we do things) that is not apparent in text or static images. The new GPT will be able to play games, use computers, control robots and do all sorts of reinforcement learning tasks. There are already a few papers, for ex: learning MineCraft from YT videos (https://openai.com/blog/vpt/).

Of course they will also generate long format videos. The problem with video is cost, it's very expensive.