Hacker News new | ask | show | jobs
Cheapest way to deploy smaller fine-tuned AI models?
2 points by johhns4 953 days ago
Any tips on how to deploy and use a fine-tuned model on Huggingface in a cost effective way? Right now looking into use Gradio with HuggingFace spaces and using the API endpoint from there. Inference endpoints and Sagemaker seem excessive for this. The whole idea to use smaller models is to decrease costs (vs using a bigger model with an API endpoint) but maybe this just isn't cost effective for where we are right now.
1 comments

If you're only using it incrementally then Replicate and Modal Labs have per-second pricing.

Not sure about HuggingFace though.

Sagemaker supposedly has a Serverless endpoint but haven't looked into it and doubt it would be a good deal since it's AWS.

Looks like replicate is perfect. Will look into it. Thanks!