| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Silagi 58 days ago
	Did you consider the R9700 or B70 when you went for the MI100? If so, what made you choose the MI100? I've been playing with picking up a card in this class but haven't been able to justify it when running the Qwen3.6 MOE model on a 6800xt is tolerable for the type of projects I've been willing to point local AI at.

1 comments

sonzohan 58 days ago

I looked at those, the Arc 1100, the w6800, MI50, MI60, v100, v620, and basically anything with 32gb of RAM:

1. I wanted an AMD card.

2. I have an RTX 3090 that's been fun to play with, but I want to get back to using it for gaming.

3. I was looking for between 30-60 tokens/second in terms of performance on the beefier models I want to run. Looking at stock Qwen3 32B the benchmarks reported about 41 tokens/second for MI100. w6800 was 18, MI50 & MI60 could do 60s but had a lot of compromises/special things to achieve that.

4. I used FitMyLLM for some spec-based comparisons (https://www.fitmyllm.com/). The MI100 is roughly double the performance on Qwen 3.5 35B A3B Q5_K_M to the R9700 (462 token/s prefill vs 239 tokens/s, 217 tokens/s vs 118 token/s for inference)

5. I was willing to throw up to $1k at a GPU; I really wanted to throw closer to $650.

To be honest, if money was no objection I would've sprung for a MI210. I also considered the MI250 as they showed up for $1250-1400 with a whopping 128GB, but the PCIE converters for that form factor don't have working AMD drivers yet.

link

rft 58 days ago

> The MI100 is roughly double the performance on Qwen 3.5 35B A3B Q5_K_M to the R9700 (462 token/s prefill vs 239 tokens/s, 217 tokens/s vs 118 token/s for inference)

Those prefill numbers look really low to me. I can run nearly that same model (qwen 3.6) at q4km with q6 cache on a single 3090 and get 2.3k-4.4k prefill and 100-170 generation. Just based on raw numbers I would expect the R9700 to land around 70-90 generation (about 2/3 of memory bandwidth of a 3090) and at least the same or higher prefill (nearly 3x FP16 TOPS on the R9700). That means the numbers really don't add up. Is the benchmark done with some special settings, e.g. parallel requests or with very low prompt length?

link

sonzohan 58 days ago

Numbers are from https://www.fitmyllm.com/ so they're not a real hardware benchmark just what you're expected to get. YMMV.

link

rft 58 days ago

Ah, ok. I took a look at the 3090 numbers and they list 400 tok/s prefill, so if I normalize my expectations to that base line the numbers you posted do make sense. I haven't dug deep into that site's methodology, but their estimates seems way off. Especially since they don't take into account cache quant when deciding whether or not you can run a model. Overall I found that website a bit confusing, but maybe the UX just didn't click with me.

link