I really want something that’s strongly typed but doesn’t require code generation like protobufs do. Yaml doesn’t do it for me. The closest I can get is putting the type guarantees in the database and using GraphQL.
- doesn't require an extra compilation step or special definition files
- Has parallel binary and textual forms so that you're not wasting CPU and bandwidth serializing/deserializing text. Everything stays in binary except in the rare cases where humans want to look or edit.
Imho, statically typed languages are the ones that benefit most from schema. The current schema version is 12 but the implementations for Go, Rust, C++ and Java are all listed as draft 7. None of them support codegen either, just validation, so not exactly compelling.
> The current schema version is 12 but the implementations for Go, Rust, C++ and Java are all listed as draft 7
It's actually 2020-12, which is two versions after Draft 7 (they shifted from Draft n to YYYY-MM after Draft 7, and since then have had 2019-09 and 2020-12.)
And that's true of most languages, though there is some 2019-09 support. (It really doesn't help that there is also OpenAPI which baked in a variant—“extended subset”—of Draft 5 JSON Schema.)
OpenAPI 3.1 which was released recently, uses JSONSchema 2020-12 as the primary schema format. As a result, we can expect further consolidation of tooling, etc in the community.
I benefit greatly from schema validation in Ruby, ensuring that ingress-processing code does not receive e.g a String or Hash instead of an Array which would have things blow up way after the ingress edge when a call to an Array method fails, or worse, produce silently broken behaviour that may or may not blow up even farther down the road because both String and Hash respond to e.g #[](Integer).
I would certainly enjoy having a DSL to write descriptive code to validate using JSON schema, but it would be even better if the Ruby definitions could be generated and persisted in Ruby files using that DSL.
Also, storing things in basic hash/array types works, but having dedicated types is useful, so that one can ensure not shoving one kind of hash in place of another unrelated kind of hash.
As for types themselves in general, there's RBS and Sorbet. One could have type definition generation as well for even deeper static and runtime checks.
In any language I eventually need to validate. Whether I do it early, using a validator, or during processing the data at later is a choice depending on the problem.
Existence of a schema definition file and checking responses against is signalling that I can trust an API vendor to be at least aware of the requirements for clients. (Whether they randomly change the schema definition or ignore it is a second question, but at least somebody once thought about formalising and it's not an complete adhoc dump of today's internal data representation)
"Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format (a superset of JSON) is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse. The rich type system provides unambiguous semantics for long-term preservation of data which can survive multiple generations of software evolution."
There are protobuf libraries without code generation, for instance: https://github.com/cloudwu/pbc (you lose the connection to the language's type system though).
- strongly typed
- ad-hoc or schema (your choice)
- no code generation step
- edit in text, send in binary