|
|
|
|
|
by bufferoverflow
946 days ago
|
|
I thought GPT-4 was not trained on labeled data, but simply on a large volume of text / code. Most of it is publicly accessible: wikipedia, archives of scientific articles, books, github, plus probably purchased data from text-heavy sites like Reddit. |
|
Another examples is the Be My Eyes data - presumably the vision part of GPT-4 was trained on the archive of data the blind assistance app has, and that could be an exclusive deal with OpenAI.