Bringing Up DeepSeek-V4-Flash on AMD MI300X

Y	Hacker News new \| ask \| show \| jobs

	Bringing Up DeepSeek-V4-Flash on AMD MI300X (fergusfinn.com)
	120 points by kkm 14 days ago

7 comments

maCDzP 14 days ago

I train on AMD MI250X and managed to get Gemma 4 31B to work - but it took a lot of work on the software side.

link

kkm 14 days ago

This is very interesting, planning to write about it?

link

kkm 14 days ago

Also the vllm patch accompanying the blogpost: https://github.com/doublewordai/vllm-amd-blog-doubleword

link

mezark 14 days ago

We at doubleword are bullish for AMD for low-interactivity inference - it does just take a bigger lift on the software side...

link

brcmthrowaway 14 days ago

Are you long AMD?

link

latchkey 14 days ago

Interesting that you ask that as AMD hits another ATH.

link

brcmthrowaway 14 days ago

Then you are definitely long on AMD.

link

latchkey 14 days ago

More accurately... I'm long on a viable alternative to the current monopoly. We have two OS's for phones (android and ios), there is no reason why we shouldn't have the same for all AI hardware and software. The only one even close, is AMD.

link

boxking 14 days ago

hello,sir, I want bulk order Asrock BC-250, is it still available ?

link

latchkey 14 days ago

lol find the discord!

link

boxking 14 days ago

yes,sir, any possibility to find 1000pcs or more

link

edg5000 14 days ago

Checked out this company about a year ago and they only offered small models. Now I see they have GLM-fp8/Kimi and DeepSeek V4 Pro. Since workloads are predominantly cached input, I'm surprised to see no separate price for cached input vs uncached. I hope the prices will drop significantly; with these prices you'll end up with thousands in monthly costs quickly. Hopefully more hardware companies will be on the market in the coming years. If the Chinese eventually start competing with the current memory makers, maybe that will help.

link

mezark 13 days ago

Hi! Co-founder of Doubleword here - we've hugely increased the number of models that we offer (partly thanks to work that we've done on hotswapping https://blog.doubleword.ai/fast-sglang-starts.

We're kind of known for our low prices - our prices (our main usage is for our high throughput API - the async tier) is significantly below average openrouter prices - but cached prices is coming soon which will lower them even more :)

link

edg5000 11 days ago

What kind of workloads are you primarily seeing from users? I´d guess coding harness-type stuff where you have repeated calls with lots of cache hits. Or is it more like bulk OCR or invoice processing?

link

benlm 14 days ago

Nice work! Would DeepSeek V4 Pro on 8xMI300X work with these patches?

link

mezark 13 days ago

we think so - but haven't tested it ourselves

link

latchkey 14 days ago

Nice work and thanks for being a customer.

(CEO Hot Aisle)

link

erichocean 14 days ago

I wish you guys could partner with Modular to get Mojo inference working on your hardware, e.g. https://www.modular.com/models/deepseek-v4-pro

link

latchkey 14 days ago

Not sure I understand. If they support MI300x, their self-hosted will run on our hardware.

link

erichocean 7 days ago

If it was that easy, I wouldn't have commented.

It's not, which is why it would be nice if they did the actual work (on your hardware).

I would 100% pay $16/hr to run a self-hosted instance, but I won't spend thousands of dollars to (maybe) get it working (my time + the hardware).

link

latchkey 7 days ago

Ok, sure. Valid. Have you asked them to support V4?

https://docs.modular.com/max/models/

I agree with you though, serving up inference is secret sauce for a lot of teams and not everyone publishes how to do it because of the costs involved in doing so. They need an ROI.

link

alfiedotwtf 14 days ago

It’s just weird Deepseek released a model that was not compatible with any of the usual engines. Without derez’s new project just to support DSv4, how long until it’s actually viable in llama :(

link