Y
Hacker News
new
|
ask
|
show
|
jobs
by
echion
161 days ago
> you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part
Are you doing this with vLLM, or some other model-running library/setup?
1 comments
coder543
161 days ago
They're probably referencing this article:
https://blog.exolabs.net/nvidia-dgx-spark/
link