| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by smcleod 360 days ago
	RTX is nice, but it's memory limited and requires to have a full desktop machine to run it in. I'd take slower inference (as long as it's not less than 15tk/s) for more memory any day!

1 comments

diggan 360 days ago

I'd love to see more Very-Large-Memory Mac Studio benchmarks for prompt processing and inference. The few benchmarks I've seem either missed to take prompt processing into account, didn't share exact weights+setup that were used or showed really abysmal performance.

link

chisleu 358 days ago

Oh I plan to produce a ton of that. I'll post a blog on it to HN and /r/localllama when I'm done.

link