| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by caeril 594 days ago
	Bear in mind that a "1 million token" context window isn't actually that. You're being sold a sparse attention model, which is guaranteed to drop critical context. Google TPUs aren't running inference on a TERABYTE of fp8 query-key inputs, let alone TWO of fp16. Google's marketing wins again, I guess.