|
|
|
|
|
by godelski
938 days ago
|
|
In this same vein, don't go down the rabbit hole of Image libraries. PIL resizing is different than Pytorch resizing. The effects are actually large enough to affect your models. Yes, even one trained at fp16. It can also result in people being displaced on leaderboards even when just changing things during evaluation. I think a lot of people, including a lot of ML researchers, would be surprised by these subtle effects and their influences. It really makes metrics a bit fuzzier than they already are (which is already pretty fuzzy not that we're in high quality domains). But at the same time it __shouldn't__ surprise people given how strong the lottery effect is. Evaluation is just fucking hard but we all light to get caught up in leaderboards. Just just can't quantify concepts as "most realistic image" or "most realistic language." Hell, that's a big reason why we invented RLHF. |
|