Hacker News new | ask | show | jobs
by anderspitman 2396 days ago
One use case where I've put (uri-encoded) JSON in the query string is for parameterizing GET requests with more structured data, since URL encoding is fairly limited and GET requests don't have a body like POSTS.
3 comments

You can probably use extended URL encoding for that, which is supported by at least body-parser [https://npmjs.com/package/body-parser]. Not sure about other PLs.
Can't you just base64 url safe encode it and get the same result? Aren't URl strings limited in their length as well?
As far as I know base64 is not guaranteed to be uri safe, though most of the time you should be fine. However more importantly converting to base64 automatically increases the size by 33%

https://developer.mozilla.org/en-US/docs/Web/API/WindowBase6...

The parent poster was referring to the "base64url" variant of Base64 which uses `-` and `_` as the two special characters and leaves out the useless padding, making it url safe. (The 33% expansion still applies though)

https://tools.ietf.org/html/rfc4648#page-7

As others mentioned, size is the main concern here, which is also why JSONCrush could be useful. But I'd definitely use POSTs first if possible.
If you don't care about IE and Edge you can fit 8KB-64KB in the URI, otherwise 2K is pretty safe.
It may not apply in your case, but in some cases, you can give a GET request a body - Elasticsearch relies on this.
From what I've seen, many intermediaries do nasty things if you have a body on anything that doesn't 'normally' have one. I've heard a fair amount about GCP's load balancers dropping unexpected bodies, and other tools giving a 5XX.

It is really unfortunate, because there are tons of use cases and zero reason to interfere with these requests.

The justification is also downright ridiculous. The argument is that "GET, DELETE, ... have no defined semantics for bodies". Meanwhile the 'defined semantics' for POST bodies is... whatever the application decides.

I made this exact argument in session last week in the HTTP working group at the IETF while discussing the semantics of DELETE. The working group chair is quite passionate defender of the position that GET should never have a body, and in the end I was outvoted by the room.

I'm slowly coming around to the idea that he might be right. The problem is the (semantic) question of what resource is being discussed. The semantics of GET, HEAD and OPTIONS are that (unless it gets modified) the same resource should always be answered the same way. If the resource that we're asking about is identified by the URL + the body, then those requests (and DELETE) all need a body too. And then there's an open question for PUT and POST about what resource exactly is being modified by the request - although as you say, the semantics are whatever so thats less important.

I think HTTP has a problem in that its hard to express a resource name that is complex. Like, "the records which match this specific elasticsearch query" is a hard thing to pack into a URL. If HTTP was different, I could imagine a second body-like part of each request called the "resource section" with its own Resource-Type: application/json header and so on. But instead we just have a URL, so we end up doing awful hacks like packing JSON into URLs and make them unreadable. The URL is long enough for this sort of thing - implementations have to allow at least 8k of space for them, and can allow more space. But they're hard to read and display, and there's no standard way to pack json in there.

So I wonder it'd be worth having a formal, consistent way to encode stuff like this in the URL. I'm spitballing, but maybe we need a standard way to encode JSON into the URL (base64 or whatever) defined in an RFC. Then if you do that, you can add a "Query-Type: application/json" header that declares that the URL contains JSON encoded in the standard way. Then we could at least have consistency and nice tooling around this. So like, when you're making a request you could just pass JSON into your http library and it'd encode it into the URL automatically in the standard way. And when the URL is being displayed in the dev tools or whatever, it could write out the actual embedded JSON / GraphQL / whatever object that you've packed in there in an easy to read way.

I don't have too big of a probably with avoiding GET bodies. What I do have a problem with is there's no way to POST an application/json body cross-origin without a preflight, which kills latency. So I'm forced to resort to text/plain or similar hacks.
The OPTIONS response should be cached by the browser if the headers are being set correctly in the response. If you want to avoid the latency cost you can proxy the POST request via your own backend, though you can't send authentication credentials from the user's browser to the 3rd party site that way.

Using text/plain only exists as a backwards compatibility thing. I wouldn't be surprised if that stops working at some point, since it breaks the security model.

Won't proxying the request just introduce its own latency? Either way you get two round trips. Is it measurably better?