Page 9 on the Llama tech report has an interesting graph that predicts task level performance from the cross-entropy loss. The sigmoidal model fits well, and at the steepest part of the S, a .01 change in NLL is worth about 5% task level accuracy.
Thanks that's actually pretty much what I was thinking, I'll have to read that to try to understand the significance of the S curve there, pretty interesting. Appreciate the link!