Hacker News new | ask | show | jobs
by jonathanaird 1943 days ago
I really want something that’s strongly typed but doesn’t require code generation like protobufs do. Yaml doesn’t do it for me. The closest I can get is putting the type guarantees in the database and using GraphQL.
5 comments

I made https://concise-encoding.org/ to deal with this:

- strongly typed

- ad-hoc or schema (your choice)

- no code generation step

- edit in text, send in binary

Why would someone choose this rather than msgpack or CBOR or protobuf or any of the other existing things in that space?
Because there isn't anything else in this space that:

- supports ad-hoc data structures or schemas per your preference

- supports all common types natively (doesn't require special string encoding like base64 or such nonsense)

- supports comments, metadata, references (for recursive/cyclical data), custom types

- doesn't require an extra compilation step or special definition files

- Has parallel binary and textual forms so that you're not wasting CPU and bandwidth serializing/deserializing text. Everything stays in binary except in the rare cases where humans want to look or edit.

That looks pretty good, actually.
I use JSON Schema to validate JSON documents.

https://json-schema.org/

Imho, statically typed languages are the ones that benefit most from schema. The current schema version is 12 but the implementations for Go, Rust, C++ and Java are all listed as draft 7. None of them support codegen either, just validation, so not exactly compelling.
> The current schema version is 12 but the implementations for Go, Rust, C++ and Java are all listed as draft 7

It's actually 2020-12, which is two versions after Draft 7 (they shifted from Draft n to YYYY-MM after Draft 7, and since then have had 2019-09 and 2020-12.)

And that's true of most languages, though there is some 2019-09 support. (It really doesn't help that there is also OpenAPI which baked in a variant—“extended subset”—of Draft 5 JSON Schema.)

OpenAPI 3.1 which was released recently, uses JSONSchema 2020-12 as the primary schema format. As a result, we can expect further consolidation of tooling, etc in the community.
I benefit greatly from schema validation in Ruby, ensuring that ingress-processing code does not receive e.g a String or Hash instead of an Array which would have things blow up way after the ingress edge when a call to an Array method fails, or worse, produce silently broken behaviour that may or may not blow up even farther down the road because both String and Hash respond to e.g #[](Integer).
Yeah but Ruby is a dynamically typed language. There's not much benefit to codegen since nothing is checked at compile time anyway.
I found code generation to be useful in Ruby with protobuf. This:

https://github.com/lloeki/ruby-skyjam/blob/master/defs/skyja...

gives that:

https://github.com/lloeki/ruby-skyjam/blob/master/lib/skyjam...

I would certainly enjoy having a DSL to write descriptive code to validate using JSON schema, but it would be even better if the Ruby definitions could be generated and persisted in Ruby files using that DSL.

Also, storing things in basic hash/array types works, but having dedicated types is useful, so that one can ensure not shoving one kind of hash in place of another unrelated kind of hash.

As for types themselves in general, there's RBS and Sorbet. One could have type definition generation as well for even deeper static and runtime checks.

Do you really want generated code to manipulate JSON? I'm not sure there is a demand for that.
Manipulating anything dynamic in a statically typed language is generally tedious and not type safe, so yes.
In any language I eventually need to validate. Whether I do it early, using a validator, or during processing the data at later is a choice depending on the problem.

Existence of a schema definition file and checking responses against is signalling that I can trust an API vendor to be at least aware of the requirements for clients. (Whether they randomly change the schema definition or ignore it is a second question, but at least somebody once thought about formalising and it's not an complete adhoc dump of today's internal data representation)

And there's Amazon Ion too - https://amzn.github.io/ion-docs/

"Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format (a superset of JSON) is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse. The rich type system provides unambiguous semantics for long-term preservation of data which can survive multiple generations of software evolution."

Strongly typed + no code generation is obviously doable in any dynamically typed language.

Apache Avro has support for parsing and utilizing schemas at runtime, even in C++.

For Apache Thrift you have things like thriftpy: https://thriftpy.readthedocs.io/en/latest/

I'm not aware of a type-safe mechanism for Flatbuffers or Protocol Buffers.

There are protobuf libraries without code generation, for instance: https://github.com/cloudwu/pbc (you lose the connection to the language's type system though).