|
|
|
|
|
by caeril
547 days ago
|
|
Bear in mind that a "1 million token" context window isn't actually that. You're being sold a sparse attention model, which is guaranteed to drop critical context. Google TPUs aren't running inference on a TERABYTE of fp8 query-key inputs, let alone TWO of fp16. Google's marketing wins again, I guess. |
|