Y
Hacker News
new
|
ask
|
show
|
jobs
by
vkaufmann
121 days ago
GPT-OSS-120B runs like hell on my DGX Spark
1 comments
embedding-shape
121 days ago
The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM.
link
vkaufmann
121 days ago
yep MXFP4 really fast :D
link