Interesting they show support for USD(Z) in the example. USD will be the standard for most VFX houses and make tooling for various web based systems so much easier.
It's a schema specification with various plugins. I guess it will mainly be about the geo specification for models (and perhaps ignore the shaders etc). Documentation is hard to come by, you basically have to look at the source / examples.
We have a 3D model format for the web in glTF. This is Apple having Not Invented Here Syndrome again. USD was developed at Pixar, of which Steve Jobs was the founding investor. Apple is the only one who wants USDZ on the Web. Everyone else is already using glTF.
Nvidia are also leaning very hard into USD with their meta- verse project. In that case USD matches requirements perfectly.
glTF 2.0 is widely supported although the quality of that support is variable. The spec is fairly large and every tool that exports glTF does so in a subtly different way.
For instance Blender adds a rotation transform node before every mesh to convert from Y-up (Blender) to Z-up (glTF). Animations run in Y-up space and target the node before that final rotation node. The BabylonJs exporter for 3dsMax converts its meshes and animations to z-up as part of the export process.
Many glTF loader implementations get tripped up on this (failing to support more than one model mesh or failing to respect the full transform hierarchy or failing to support animations). There are similar complexities and support issues around vertex data formats, buffer layouts, punctual lights etc
My limited reading of the USD format is that it has an even wider specification. My concern is that browser support will splinter even worse than it would with glTF. The' model' proposal doesn't dig into what USD nodes will and won't be supported and exactly how the data is expected to be rendered.
Blender is Z-up and glTF is Y-up, but regardless, this is a long standing issue of Blender. These problems exist for all of the model export formats coming out of Blender. How is that glTF's fault?
GLTF, especially the binary format, is specifically design for transmission over the web. Wavefront is uncompressed text, Collada is even worse because it's extremely verbose XML. I don't know a lot about USDZ, but what I do know is that it's overkill for the web (graphics features that are mostly too costly to do in WebGL, additional scene description data like audio tracks, etc), while also lacking specific features that GLTF has that are aimed at the web, like streaming, progressive download of texture data.
https://graphics.pixar.com/usd/docs/
The closest thing I can find there is the glossary and the C++ doxygen api docs.