Hacker News new | ask | show | jobs
by XVincentX 1434 days ago
Disclaimer: I work for Microsoft as an API Architect.

I am not working on this specific API, so I am not going to comment on anyway. I hear your complains about Microsoft API guidelines (which is an entire different conversation) but I wanted to add my two cents with regards to JSON Schema.

The problem that I have been having with JSON Schema since forever - is that the data that is being modeled is complected with contextuality of its usage. For instance, if I have

type user = { name: string, surname: string, password: string }

IN JSON Schema it is very hard to give contextuality on it, and most of the times involves having two separate types.

Here is an example:

If I am creating a new user, then name, surname are mandatory, while password is not because the system is autogenerating it. If I a logging in - then I want ALL of the fields.

As of today, it is very hard in JSON Schema to express this.

Basically speaking, I am arguing that the data structure is a thing, another one is its usage in a context, where there can be requirements and complicated validation logic involving even other fields

In my experience, the only thing that has been very very close to what I have been looking for when modelling systems is Clojure. Most of the people laugh to my face when I say that primarily because it is a LISP 2 and yet... In particular, spec (and even better spec2) have the tooling to express data structure as sophisticated as we want without a type system and with the contextuality constraints that are fundamental for a real type reuse.

2 comments

I'm not sure what you mean by "it is very hard to give contextuality on it"; OAS does supports referring to a type by reference, so that higher level types can reuse the definition of structs they might contain.

But even so, here the problem is that the APIs aren't actual PUT/GETs: they payload types aren't the same going up as they are coming down. It is really two separate types, one for PUT, one for GET.

Some of that is to be expected (there will be some information after the create that is only added by the VM coming into being) but how Kubernetes handles this with a separate "status" for the item I think ends up letting the rest of the type (spec, in k8s's case) be the same type. (… ish. K8s has variants of this problem, too.)

To expand a bit, I'm largely relegated to the API docs themselves. Browsing the actual schema is hard:

  Start at: https://github.com/Azure/azure-rest-api-specs
  Descend into specification.
  Descend into … so many choices … compute.
  Descend into resource-manager.
  Descend into Microsoft.Compute
  Descend into stable
  Descend into — and this is tricky!
    the latest version isn't the latest version.
    The latest version is 2022-04-04, but for VM creation it's 2022-03-01.
    The only way I know to determine this is to seek backwards, or find it in the docs.
  Descend into ComputeRP
  Descend into virtualMachine.json.
And it's 3.3k LoC! Some of this verbosity is JSONSchema, to be sure… but still. And then you might have to wade back up to common.json, though I forget what circumstances cause me to need to look there.
I’ve debated back and forth on whether it’s a good idea to have separate input and output models for each endpoint, because trying to have a generic structure that’s usable everywhere makes it really easy to pipe output back to input for a PUT, but it’s difficult to express constraints like, this field cannot be updated, only created or these fields are required but only for create and update supports a different subset of the fields again.

I think ideally you want seperate structures but you need tooling which helps you map between output/input structure automatically (in strongly typed languages, it’s easy in Python or JavaScript) and that’s just lacking currently.

Been there too. I'm a big fan of explicit data models (DTOs) for each endpoint, as I think any kind of interface description should be as sound, concise and precise as possible. That means never having any property sent that is redundant or unnecessary (=gets ignored by the backend). But I do see the problems and increased engineering effort to achieve this. Especially since a lot of BE frameworks and languages do not support union and intersection types (looking at you, C# and Java) while OAS does. The endresult is usually larger shared DTO models, where any given subset of properties is set to "null" depending on the endpoint it is used for. Not a clean interface design, but decreases a lot engineering effort.