Hacker News new | ask | show | jobs
by smcleod 360 days ago
RTX is nice, but it's memory limited and requires to have a full desktop machine to run it in. I'd take slower inference (as long as it's not less than 15tk/s) for more memory any day!
1 comments

I'd love to see more Very-Large-Memory Mac Studio benchmarks for prompt processing and inference. The few benchmarks I've seem either missed to take prompt processing into account, didn't share exact weights+setup that were used or showed really abysmal performance.
Oh I plan to produce a ton of that. I'll post a blog on it to HN and /r/localllama when I'm done.