Hacker News new | ask | show | jobs
by chaxor 1133 days ago
It could be from training if more to be safer. This was noted by Microsoft early on with GPT-4. Specifically, when looking at the tikz unicorn qualitative benchmark, the unicorn got better with more epochs, which is obviously expected.

However, very interestingly, the unicorn image got far worse when they trained the model to be safer by trying to correct discrimination against various demographics.

This isn't very intuitive to me why that may occur, and seems to conflict with what has been shown in ROME, etc. So I'm surprised it hasn't been commented upon more. It's certainly one of the best examples of how we don't understand what's going on with these models, and it causes very unexpected outcomes.