Hacker News new | ask | show | jobs
by tim-tday 114 days ago
How specifically do you propose this? Have you tried running inference on a local machine? (You should try that) I run local models on desktop machines, laptops, dedicated servers, cloud servers.

I’ve built my own LLM containers, ive built orchestration systems for fine tuning and model management. I’ve tried quantized models. I’ve tested a dozen or so models of different sizes.

You can’t really get around the fact that inference on cpu is slow, inference on gpu is gated by nvram (you need about 1gb of nvram per billion parameters, quantization reduces quality and increases operational toil). If you know of a consumer level gpu with 80-128gb of nvram that I can buy for less than $10k do please let me know.

Short of a specific proposal I’m going to classify your suggestion as not knowing enough about what you’re talking about for your proposal to make any sense.

1 comments

You seem to have done a lot, but it was always on a high level. It always started at the point, at the assumption, that you needed a lot of parameters to get good answers. Now tell me this. Who says that? So obviously you lack the hardware to train a model from scratch. So you don't know the mathematics behind what training from scratch means. You maybe know programs, but somebody created those programs. That wasn't you. So you are a consumer. You are not a creator in the original sense. I'm only willing to continue with this if you don't go ad hominem.