Hacker News new | ask | show | jobs
by Nischalj10 999 days ago
this looks much like replicate. has anyone tried it(https://replicate.com/). How's the experience with cold start?

We have models that are crucial but do not require dedicated hosting. We are looking for an aws lambda type of service, but for a fine tuned llama2-13b. any suggestions? would try out Cloudflare AI too.

2 comments

The problem with all existing pay as you go vendors is that the overall price is exceptionally high if you use any decent amount of compute. That and cold start.

It's often cheaper and far more powerful in quality and latency to pay for a full server funnily enough.

Yeah, once the traffic is large it does make sense to have own compute
Cloudflare AI and Replicate are great for running off-the-shelf models, but anything custom is going to incur a 10+ minute cold start.

For running custom fine-tuned models on serverless, you could look into https://beam.cloud which is optimized for serving custom models with extremely fast cold start (I'm a little biased since I work there, but the numbers don't lie)

Thanks! Looks promising from the outside. Will surely check out
Why would it incur a cold start of 10 minutes on cloudflare? :O

Any proof?