That's not true though. It just counts the number of code units, that's not version dependent. It's certainly no worse than counting the number of UTF-16 points (I'd argue it's better since it's less arbitrary - whether something is a unicode scalar is a design decision, whether something is in the BMP or not is mostly an accident of implementation).