| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ejames 5075 days ago

Hmm. I appreciate the idea of a standardized way to modify a JSON document, but I'm not certain what problem it realistically solves. Can someone who's worked with APIs that accept updates give a code example of something that would be better if you accepted patches instead?

Other commenters have brought up size - that the patch must be quite a bit smaller than the whole document in order to gain a benefit - but I'm not sure it's useful even then. JSON is typically the representation of an object for transmission over the wire, not its final representation on the server.

I have a Rails servers with a JSON API, but it stores information in a SQL database where the columns roughly correspond to the JSON keys. I'm not actually storing JSON in the database and the mapping of JSON key to database column is not always one-to-one. That means if I want to accept a patch for an object with JSON Patch, I need some way of applying the patch "upstream" to the SQL row that is the source of truth for the object. And, of course, for a database with a schema, only a strictly limited subset of potential JSON Patch objects submitted can actually be applied.

The architectural benefit of accepting JSON Patch as an update is that "it's just JSON", but in order to apply a patch on pure JSON, you must have pure JSON and store pure JSON. It's now common for servers to use JSON but that doesn't mean it's "JSON all the way down". In the case of my Rails project, I would need to generate the JSON for an object (or retrieve it from the cache), apply the patch to that JSON, then treat the result as if it had been submitted as an entire document to the update endpoint. Instead of parsing received JSON, I need to generate JSON, process a patch, then parse the result. Whether or not the client has to deal with the entire JSON document, the server has to do so.

Now moving the burden to the server can actually be a very important improvement if that's where you want to do the work, but I don't know that you need a new standard of JSON document to do that - if you want to perform small updates on large objects, you can write the server such that clients are allowed to omit key-value pairs that are not to be changed when sending JSON to update existing items. Perhaps I'm just dense, but it seems like you only need to iterate through the keys that are included and update only those fields.

JSON Patch is a defined format for doing that but it adds a layer of abstraction to the operations, when you actually would want clients to be concrete in specifying keys and values so the server doesn't have to perform lookup to figure out which value is under which path.

Likewise for the concept of "synchronizing" structures, the problem is that you cannot apply a patch correctly unless you start with the same JSON before applying the patch. I don't know if the speed gain from sending patches instead of entire documents justifies the extra reliability work you need to do to make sure the synchronization eventually succeeds. Plus, just as you don't save time in the code to apply a received patch unless a JSON document is the canonical "source of truth" for that object, it takes work to generate a patch for another party to apply unless a JSON document is the "source of truth" and you can cheaply produce the diff in terms of JSON. (Like any document format, you CAN process and produce the data eventually, but the question is whether it's easier than alternatives.)

It seems like this could be useful for a system where you specifically want to store the changes to an item - i.e. an audit log - but I don't know if JSON Patch is substantially better than existing ways of doing that. Before reading this paper, if I was asked to write a project that stores the changes to a chunk of JSON, I might have done something like insert a \n character after each key-value pair and store the history as commits in git. That's a hack, but since JSON is text and developers already have lots of ways to manipulate text, I'm not sure why it's necessarily better to work with the state changes as "JSON that is changed by events that are also JSON" rather than "text that is changed by events, and the result always parses as JSON", since there are already good tools for working with plain text and evaluating changes to the text.

I can see this being useful in a system where you have really set a deep architectural assumption that it's "JSON all the way down", and therefore a JSON Patch can be treated with some level of abstraction rather than just being a well-defined shorthand for sending fewer bytes over the wire. If the system is a document store that considers arbitrary JSON to be the unit of storage, then you don't need to worry about mapping into alternative storage formats. However, that seems like a very specific subset of the architectures in the world of "servers that communicate with JSON".

1 comments

dexen 5075 days ago

> Can someone who's worked with APIs that accept updates give a code example of something that would be better if you accepted patches instead?

Concurrent edition of tabular data by several users.

User loads a HTML <form> with a bunch of widgets into her browser, edits some of the data and <submit>s. The usual implementation causes all data submitted via HTTP POST request, as simple KEY=VALUE dataset. Should two users send POSTs of same resource (same URI) concurrently, it is not clear what are real changes and what is unchanged, nor how to apply it to the resource.

What I'd like to do -- and indeed, some parts of my software does, albeit in non-standard way -- is submitting diff, in form of KEY=<original-value, new-value>. Makes it easier for the backend to serialize users' changes, and either apply them in atomic way or reject just the conflicting ones.

link

ejames 5075 days ago

Thanks for the example. Receiving a patch does let you know just what change the user intended to make, and often you actually care about the change, not the before or after.

As I keep looking at this, it seems that it's most valuable when you consider the patch as an element of an event-based system, where you're storing or operating on changes rather than just the data. The reason it helps with a bulk editing process is that you want to know what changes Alice and Bob made if they both submit at the same time - operating on diffs simplified the system. For other systems where you operate on diffs, like an audit log, it's helpful to have a first-class format that defines the diff on a JSON document.

I think that's the biggest caveat I'm seeing with the proposal. It's pitched as a way to handle HTTP PATCH requests, but a format that defines "diffs on JSON documents" is most suited for systems that primarily store JSON documents and diffs on JSON documents - it is much less broadly applicable for "HTTP servers that accept JSON data", since frequently they only work with JSON over the wire, not as the ultimate source of truth. If you don't store JSON documents as the ultimate source of truth or have a domain-model reason to want diffs, I'm unsure of the advantage.

link

ender7 5075 days ago

Agreed, I'm excited for this format, but only because I store almost all of my state in JSON blobs.

link