With that reasoning almost every format is a composite, which doesn't sound like a useful distinction. Such metadata should be fine as long as the metadata itself is isolated and can be updated without the parent format.
My reasoning for Exif was that it is not only auxiliary but also post-hoc. Exif was defined independently from image formats and only got adopted later because those formats provided extension points (JPEG APP# markers, PNG chunks).
You've got a good point that there are multiple types of metadata and some metadata might be crucial for interpreting data. I would say such "structural" metadata should be considered as a part of data. I'm not saying it is not a metadata; it is a metadata inside some data, so doesn't count for our purpose of defining a composite.
I also don't think tar hardlinks are metadata for our purpose, because it technically consists of the linked path instead of the file contents and the information that the file is a hardlink, where the former is clearly a data and the latter is a metadata used to reconstruct the original file system so should be considered as a part of larger data (in this case, a logical notion of "file").
I believe these examples should be enough to derive my own definition of "composite". Please let me know otherwise.
Your reply suggests that, if all the metadata is auxiliary it can be segregated from the data and doesn't count as a composite.
However, that doesn't exclude archives (in many use-cases the file metadata is as important as the data itself; consider e.g. hardlinks in TAR files)
Nor does it exclude certain vital metadata for images: resolution, color-space, and bit-depth come to mind.