| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by namanski 786 days ago

I've built a product in this regard - specifically for fine-tuning and deploying said fine-tuned models.

You'll need GPUs for inferencing + have to quantize the model + have it hosted on the cloud. The platform I've built is around the same workflow (but all of it is automated, along with autoscaling, and you get an API endpoint; you only pay for the compute you host on).

Generally, the GPU(s) you choose will depend on how big the model is + how many tokens/sec you're looking to get out of it.