This is correct. The C++ standard specifies that sort must have a big-O complexity of n log(n). This can be seen on pdf page 925 of this working draft from 2014 [1].
No, f = O(n log n) means that f doesn't grow significantly faster than n log n. That is true for f = n log n, but it's also true for f = n. The "or better" is implicit by using big-O.
Note that this wouldn't be true if the standard said that it has to be Θ(n log n).
The standard doesn't seem to say "or better" here, but I know that in other places it does (or least used to say something similar).