|
|
|
|
|
by Bolwin
18 days ago
|
|
No one is producing one output token though. And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches. So its memory + the time it takes to unload/load into vram + the extra cost per output token Is it a scam? Idk |
|