Hacker News new | ask | show | jobs
by naveen99 462 days ago
If you have the resources to fine tune, you have the resources to run inference on fine tuned model.

If you want to scale up and down on demand, you can just fine tune on openai and google cloud as well.

1 comments

> If you have the resources to fine tune, you have the resources to run inference on fine tuned model.

I don't think that's true.

I can fine tune a model by renting a few A100s for a few hours, total cost in the double digit dollars. It's a one-time cost.

Running inference with the resulting model for a production application could cost single digit dollars per hour, which adds up to hundreds or even thousands of dollars a month on an ongoing basis.

This assumes that inference is needed 24/7.

That may or may not be true for use-cases that require asynchronous, bulk inference _and_ require some task-specific post-training.

FWIW, my approach towards tasks like the above is to

1. start with using an off-the-shelf LM API until

2. one figures out (using evals that capture product intent) what the failure modes are (there always are some) and then

3. post-train against those (using the evals)