Hacker News new | ask | show | jobs
by simonw 462 days ago
> If you have the resources to fine tune, you have the resources to run inference on fine tuned model.

I don't think that's true.

I can fine tune a model by renting a few A100s for a few hours, total cost in the double digit dollars. It's a one-time cost.

Running inference with the resulting model for a production application could cost single digit dollars per hour, which adds up to hundreds or even thousands of dollars a month on an ongoing basis.

1 comments

This assumes that inference is needed 24/7.

That may or may not be true for use-cases that require asynchronous, bulk inference _and_ require some task-specific post-training.

FWIW, my approach towards tasks like the above is to

1. start with using an off-the-shelf LM API until

2. one figures out (using evals that capture product intent) what the failure modes are (there always are some) and then

3. post-train against those (using the evals)