| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ephemeral-life 829 days ago
	30x is the type of number that when you see it in a generational improvement, you should ignore it as marketing fluff.

1 comments

azeirah 829 days ago

From how I understood it, it means they optimised the entire stack from CUDA to the networking interconnects specifically for data centers, meaning you get 30x more inference per dollar for a datacenter. This is probably not fluff, but it's only relevant for a very very specific use-case, ie enterprises with the money to buy a stack to serve thousands of users with LLMs.

It doesn't matter for anyone who's not microsoft, aws or openai or similar.

link

misterdabb 828 days ago

It's a weird graph... It's specifically tokens per GPU but the x-axis is "interactivity per second", so the y-axis is including Blackwell being twice the size and also the increase from fp8 -> fp4, note it will needs to be counted multiple time as half as much data is needed to be going through the networks as well.

link

acchow 829 days ago

They showed 30x was for FP4. Who is using FP4 in practice?

link

KaoruAoiShiho 828 days ago

But maybe you should. Once the software stack is ready for it there'll be more people since the performance gains are so massive.

link

dagmx 828 days ago

It would depend highly on the model though. Some stuff will generalize better to FP4 than others.

link