Hacker News new | ask | show | jobs
by neutrinobro 25 days ago
Very nice to see this older hardware getting repurposed. I have been running 2x Tesla V100s in a dual-core supermicro X10DRU-i server. With qwen3.6-27B-mtp I get about 35-40tok/s for inference for moderate context sizes (<128k), and have run long running agent tasks on it which consume 100s of millions of tokens (>$100s if I had to pay claude API costs). However, the main purpose that I have to for these cards is for scientific compute, the FP64 performance (7+ TFLOPS!) is fantastic given their age, and not something you can get on even the latest consumer grade cards since Nvidia nerfed their performance after Kepler. The server lives in the basement though...it is freaking loud!