For binary labels: you take a slice of labeled data. The mean of the ML model prediction on this data is different from the mean of the label. In practice, often a synonym for "loss is worse / could be better".
Not sure if that's what the GP meant, I only worked with binary labels stuff.
Calibration (in a binary context) basically means that the confidence of a model/score matches the probability that a particular label is positive or not.
For instance, a calibrated classifier for a coin flip predictor should output 50-50. A poorly calibrated classifier would output higher confidence for heads/tails.
Not sure if that's what the GP meant, I only worked with binary labels stuff.