Hacker News new | ask | show | jobs
by Doxon 27 days ago
These techniques are used by DeepSeek, and work well with the commodity (NVIDIA) GPU's they use. Google designs their entire AI stack from the custom silicon up. So they have different optimization approaches. (Though Gemma does use MTP)