|
|
|
|
|
by mikewarot
321 days ago
|
|
You know what's actually hard to find in all this? The actual dimensions of the arrays in the model GPT-OSS-120B. At least with statically typed languages, you know how big your arrays are at a glance. I'm trying to find it in the GitHub repo[1], and I'm not seeing it. I'm just trying to figure out how wide the datastream through this is, in particular, the actual data (not the weights) that flow through all of it. The width of the output stream. Just how big is a token at the output, prior to reducing it with "temperature" to a few bytes? Assume infinitely fast compute in a magic black box, but you have to send the output through gigabit ethernet... what's the maximum number of tokens per second? [1] https://github.com/openai/gpt-oss/tree/main/gpt_oss |
|