| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by taffydavid 5 days ago
	Noob q: can advancements like this targeted at local inference have bonus effects for cloud inference? Presumably if you can get great results on cheaper hardware that also equates to less resource usage on cutting edge hardware, and less power draw? Will advancements like this ultimately reduce the carbon footprint of AI?

1 comments

goldenarm 5 days ago

Consumer and server hardware are quite different, especially Google's TPUs. They notably have much larger mixture-of-experts ratios and more complex caching systems. At such scale and inference budgets, they are incentivised to optimize as much as possible.

Also Google Deepmins has a six month embargo on strategic papers, so I bet the juiciest quantization tech isn't public yet.

link