| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by angoragoats 691 days ago
	You could for sure, but the nVidia setup described in this article would be many times faster at inference. So it’s a tradeoff between power consumption and performance. Also, modern GPUs are surprisingly good at throttling their power usage when not actively in use, just like CPUs. So while you need 3kW+ worth of PSU for an 8x3090 setup, it’s not going to be using anywhere near 3kW of power on average, unless you’re literally using the LLM 24x7.

3 comments

exyi 691 days ago

Even if you are running it constantly, the per token power consumption is likely going to be in a similar range, not to mention you'd need 10+ macs for the throughput.

link

robotnikman 691 days ago

I have a 3090 power capped at 65%, I only notice a minimal difference in performance

link

cranberryturkey 691 days ago

Can Reflection:70b work on them?

link

christianqchung 691 days ago

Pretty sure it'll work where any 70b model would, but it's probably not noticably better than Llama 3.1 70b if the reports I'm reading now are correct.[1]

[1]https://x.com/JJitsev/status/1832758733866222011

link

angoragoats 691 days ago

Maybe you meant to reply to a different comment? Work on what?

Edit: I guess to directly answer your question, I don’t see why you couldn’t run a 70b model at full quality on either a M2 192GB machine or on an 8x 3090 setup.

link