|
|
|
|
|
by 01092026
163 days ago
|
|
You asked us...well, first tell us what's your real driver? You have three years on local infrastructure? What does that even mean - you're running Ollama Llama_70b for 3 years? Whats your stack? And none of that hardware can run larger models, smaller tiny ones, or highly quantized versions of larger ones sure. Or do you have something important to say? |
|
Our stack changes per project, adapting to client needs and infra: Llama 70B on a Mac Studio M1 with Ollama in 2024, vLLM on 4xH100 private cloud for larger deployments. Most recently, we've been working on a custom workstation with 2x RTX PRO 6000 Blackwell Max-Q + 1.1TB DDR5 to run larger models locally using SGLang and KTransformers.
The question isn't rhetorical, I'm trying to understand if the demand we see in regulated sectors is the whole market or if there's broader adoption I'm missing.