Hacker News new | ask | show | jobs
by jdietrich 889 days ago
>suppose you find that the model has a bias in terms of labeling African Americans as criminals; or women as lousy computer programmers. If all you have is the model weights of the trained model, how easily can you fix the model?

That's textbook fine-tuning and is basically trivial. Adding another layer and training that is many orders of magnitude more efficient than retraining the whole model and works ~exactly as well.

Models are data, not instructions. Analogies to software are actively harmful. We do not fix bugs in models any more than we fix bugs in a JPEG.

1 comments

Instructions is exactly what weights are. We just have no idea what those instructions are.