| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by deyiao 480 days ago
	I heard their inferencing framework is way lower than typical deployment methods. Can this be verified from that open-source project? How does it stack up against vllm or llama.cpp

4 comments

reissbaker 480 days ago

By "lower" you mean cheaper/better?

I suspect it's much higher throughput than vLLM, which in turn is much higher throughput than llama.cpp. The MLA kernel they just open-sourced seems to indicate that, although we'll see how it does in third party benchmarks on non-hobbled GPUs vs FlashAttention. They only released the BF16 version — whereas most people, including DeepSeek themselves, serve in FP8 — so it might not be immediately useful to most companies quite yet, although I imagine there'll be FP8 ports soon enough.

link

nialv7 480 days ago

i think they meant lower level.

link

bee_rider 480 days ago

It seems hard to guess. Could be lower level, lower performance, or lower compute cost.

link

helloericsf 480 days ago

What do you mean by "lower"? To my understanding, they will open 5 infra related repos this week. Let's revisit your comparison question on Friday.

link

find0x90 480 days ago

I don't see any use of PTX, might be in one of the other repos they plan to release.

link

DesiLurker 480 days ago

right, I think PTX use is a bigger deal than its getting coverage for. this opens an opening for other vendors to get their foot in with PTX to LLVM-ir translation for existing cuda kernels.

link

feverzsj 480 days ago

Maybe. Apple ditched them in China, because their infra can't handle large scale users.

link

helloericsf 480 days ago

Don't think the decision is based on infra, or any technical reasons. It's more on the service support side. How a 200-person company supports 44M iPhone users in China?

link

chvid 480 days ago

Is that true? I thought Apple was going to use their own infrastructure.

link

tw1984 480 days ago

deepseek doesn't have any experience on support a 50 million user base. that was the reason cited by apple a few weeks ago.

link