| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by naasking 793 days ago

> You'd have to keep the LLM on a constant RLHF fine tuning treadmill in order for it to actually learn from errors it might make, and then that re-opens the can of worms of catastrophic forgetting and the like.

If the LLM required a constant fine-tuning treadmill, you wouldn't actually use it in this application. You could tell if you were on such a treadmill because its error rate wouldn't be improving fast enough in the initial phases while you were still checking its work.

As for what recourse you have in case of error, that's what fine-tuning is for. Your recourse is you change the fine-tuning to better handle the errors, just like you would correct a human employee.

Employees are not financially liable for mistakes they make either, just their job is at stake, but this is all beside the point, at the end of the day the only rational question is: if the LLM's error rate is equal to or lower than a human employee, why prefer the human?