I interpret each directive in a Dockerfile as creating a new layer of an image. So this ARG-before-FROM gotcha doesn't feel like a gotcha to me, but rather, the consequence of literally interpreting "ARG" and not knowing the side-effects of a directive in a Dockerfile. (Yes, even WORKDIR, ENTRYPOINT, and related instructions create a layer, albeit a 0-byte one)
If you need to write in the docs about a surprise that a user otherwise wouldn't have expected, may be it's a sign that the surprise should be fixed up such that it's not surprising behaviour.