|
|
|
|
|
by namanski
786 days ago
|
|
I've built a product in this regard - specifically for fine-tuning and deploying said fine-tuned models. You'll need GPUs for inferencing + have to quantize the model + have it hosted on the cloud. The platform I've built is around the same workflow (but all of it is automated, along with autoscaling, and you get an API endpoint; you only pay for the compute you host on). Generally, the GPU(s) you choose will depend on how big the model is + how many tokens/sec you're looking to get out of it. |
|