|
|
|
|
|
by kir-gadjello
1185 days ago
|
|
This document doesn't contain the architecture and training details of GPT-4. As an engineer, these details would be the most interesting part of it! Driven by interest in GPT-4 and cutting edge LLMs I studied the research literature and compiled a small list of architectural and training details which very likely underpin GPT-4 in this blogpost: https://kir-gadjello.github.io/posts/gpt4-some-technical-hyp... While this is a work in progress, the most important part is already in place and thus I decided to publish it in its current draft state. Have fun following the TLDR and Arxiv links, fellow HNers! |
|
I've added your hypothesis to these ones:
https://lifearchitect.ai/gpt-4/
There's quite a broad range of guesses going on. I lean towards 80B language + 20B vision params trained across 3T collected tokens (could repeat to 10T+), but one of the other (strong) hypotheses is a dense 7T param model. That's absurd...