| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by overgard 100 days ago
	Can't answer for an RTX 5090, but for an RTX 5080 16GB of RAM (desktop), I get about 6 tokens/sec after some tweaking (f16->q4_0). Kind of on the borderline of usable.. probably realistically need either a 5090 with more RAM or something like a Mac with a unified memory architecture.

2 comments

datadrivenangel 100 days ago

My M5 Pro is getting ~11 tokens per second via OMLX for an 8 bit quant.

link

angoragoats 100 days ago

A Mac is not going to be all that much faster than a 5080 with any models, other than the ones you can’t currently run at all because you don’t have enough GPU+CPU memory combined.

You’re much better off adding a second GPU if you’ve already got a PC you’re using.

link