OpenUSD has a plaintext encoding mode, which by convention gets saved as .usda. The standard OpenUSD SDK contains two command line utilities `usdcat` and `usdedit` to do plaintext editing, but most programs using the SDK have better editing functionality by directly using the C/Python SDK.
The ASCII (plain-text, .usda file extension version) absolutely.
For large scenes/models, the normal .usd binary/compressed version is often used for efficiency reasons (and proper round-tripping of float values for xforms, etc), but you can convert between the two with the 'usdcat' util and the python/c++ apis for debugging.
Why do so many (all?) textual data serialization formats represent floats in base-10 scientific notation, anyway?
If we wanted floats that are 1. human-editable but 2. bijective with IEEE754, wouldn't floating-point hexadecimal (and "e" notation representing a base-2 exponent) be a better idea?
I mean, depends on the human. Most don't know hexadecimal, but know what 3.14 means.
The real issue is why do so many float parsers and printers fail to do exact round tripping? Designing a good algorithm for this was a bit difficult, but these days this is a solved problem.
If I had to take a slightly snide guess: because these are low level tools, so there's a 90% chance that these parsers/prints are written in C, or ultimately depend on C implementations. As any C programmer would know, C loves to throw "undefined behavior" at any problem it doesn't bother to document. Which is a lot.
That combined with almost zero package management for retrieving things that were solved decades ago means we keep coming into this issue, partially
because of the mindset of C programmers.
This is just "hur dur C undefined lol" level of a comment.
If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse. If that means implementations have to ship their own float formatter and parser than so be it - no one is tied to whatever comes with their libc, package manager or no package manager.
But it isn't just about undefined behavior, it's more about the culture of C and how it approaches package management and sharing (or in this case, doesn't). Even if C has Rust level correctness checking it would have the same issue.
>If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse.
Well I guess we have our answer in the case of seriousness. I'm guessing it didn't matter enough for the implementers, or it did matter but could never actually get it implemented. The reasons for this are numerous, contextual (with context we'll never have), and probably not rooted in technical reasoning.
We are talking about the domain of animation and games, after all. Not mission critical code. There's more wiggle room, especially for the complexity of media around when the format was being developed.
The best of both worlds, at least in my opinion, would be to write a float as a polynomial in two parts (where either part alone is still a float): an integer part with an optional scientific-notation exponent; and a fractional part, where the fraction's denominator is always a power of two.
So 5e3 is a float; 3/8 is a float; and 5e3+3/8 is a float. Each cleanly and exactly representing particular IEEE754 values, while also being readable as a base-10 polynomials.
Maybe fractions of arithmetically-specified powers of two could also be allowed, for really big denominators. 3/2**26, for example.
That's not particularly user-friendly though: at least for CG/VFX software (where USD came from and is designed for), non-technical (at least in terms of understanding IEEE floats) people like artists often want to look at the values to verify stuff for 'debugging' (i.e. is the software tool I'm using actually exporting the correct values I selected in the UI params panel).
Having to do any form of interpretation (even scientific notation is not ideal in some cases), is not great for many users.
This would seem to point to a fundamental impedance mismatch between textual dump formats used as debugging aids; vs textual project file formats which are human-readable for the purpose of permitting the use of text-based tools to process low-level data structures of the project before it's loaded back in.
Most OOP languages have a "debug print" or "shell inspect" method that the programmer can override, where by default the method will print something that's valid language syntax, but where the overrides aren't required or expected to be such, and instead should concisely describe the object at the expense of being reloadable. These same languages usually also have support for custom serializers for text-based serialization formats like JSON. The serializer implementation for JSON, and the serializer implementation for "shell inspect", are rarely identical.
I think what a CG/VFX artist would want here, isn't that the canonical textual file-format "for import" gives them decimal-serialized floats; but rather that they have the option to "inspect" the project, resulting in a view that looks like e.g. https://www.tonymacx86.com/media/ioregistryexplorer.187440/f... — an hierarchically-expandable "shell inspect" of the project. It makes perfect sense for the floats in such a read-only debugging-oriented view to be rendered in decimal (esp. if a raw canonical binary-data representation is given in parentheses beside the rendered value.)
FWIW, well-implemented round-to-nearest conversion routines (e.g. Python and IIRC Glibc, although MSVC is historically bad about this) will roundtrip IEEE>decimal>IEEE if you use the correct number of digits (at least 9 for singles and 17 for doubles), for reasons of mathematics and not implementation. (The other way around also works, barring exponent under- and overflow, but for at most 6 and 15 digits respectively, so I wouldn’t call bijective, strictly speaking.)
The conversions are not even that hard ... unless you want to deal with arbitrary (and arbitrarily long) decimal representations and not just those that arise from IEEE numbers. Essentially the only choice to make is whether the conversion to decimal will emit all the digits all the time (simpler) or the shortest number of digits that will round to the requested IEEE float when read back (less liable to be mocked in webcomics[1]).
Of course, using hex floats is much simpler than even the simplest implementation of the above; I just want to point out that IEEE floats are perfectly roundtrippable through decimal.