| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jmalicki 126 days ago
	The nice thing about modern LLMs is that it's a relatively large static use case. The compute is large and expensive enough you can afford to just write custom kernels, to a degree. It's not like CUDA where running on 1, 2, 8 GPUs and you need libraries that already do it all for you, and where researchers are building lots of different models. There aren't all that many different small components between all of the different transformer based LLMs out there.

1 comments

gordonhart 126 days ago

Yeah, given that frontier model training has shrunk down to a handful of labs it seems like a very solvable problem to just build the stack directly without CUDA. LLMs are mechanically simple and these labs have access to as much engineering muscle as they need. Pretty small price to pay to access cheaper hardware given that model runs cost on the order of $100M and every lab is paying Nvidia many multiples over that to fill up their new datacenters.

link