|
|
|
|
|
by simonw
40 days ago
|
|
I believe it's an evolution of the technique used in GPT-Image-1 (or whatever they called that), which was derived from their work on making GPT-4o an "omni" model that can directly output images and audio in addition to text. The 2024 GPT-4o launch post https://openai.com/index/hello-gpt-4o/ hints about how that works: "With GPT‑4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network." |
|