Hacker News new | ask | show | jobs
by paulddraper 1113 days ago
I see the need, and good write up, but just use this for the definition of GET body.

Nothing in the existing spec prevents GET from having a body, though there isn't currently a semantic meaning to it.

This would fit perfectly, be more compatible and result in a simpler spec and protocol.

3 comments

Despite the spec, a lot of clients, load balancers, and server libraries can't handle GET with a body.
They won't support a new method either.
But it being an unsupported method will hopefully at least cause the middlebox to generate a 405 error, rather than undefined behavior.
Rather than a 400 error?
No, rather than the body being silently ignored in the caching (or even proxying!) logic of some middleboxes, because "since GET will never have a body, for efficiency, we won't bother to check for one."
Yet, given enough demand, and a explicit recommending nod from the spec, these will comply. It's not really as big as problem as everyone seems to think. And it's literally the same amount of work of adding a new verb.
Things might've changed since then, but back in 2009 it was Chromium who disabled bzip2 compression after some ISPs were borking bzip2-compressed pages[1] (although it's mot entirely clear if it was indeed the reason for dropping bzip2), not the other way around. Later in 2016 this was mentioned as being the reason for limiting brotli compression to HTTPS only[2].

[1]: https://bugs.chromium.org/p/chromium/issues/detail?id=14801 [2]: https://bugs.chromium.org/p/chromium/issues/detail?id=452335...

So while I agree that it would be nice if everyone respected a "living standard", my hopes for middleboxes to comply are not high.

Which is somewhat surprising given that it is common enough that I've come across it several times in the wild.

So much so that I added support for it to my own server and client libraries. Which means that adding support for QUERY will be trivial (yay!)

As an aside, I also support DELETE with a body.

Caching the body scares the hell out of me.

If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

Right now you can:

* Pass the arguments as parameters

* Pass them on the request body. I personally done it on apis for games in unity/ios/android for almost a decade). Other products like Elasticsearch count on that as part of the core product.

* Semantically create searches in the server with POST /search

In the previous two examples, you can return a redirect to the search results (like /searches/33) with perfect caching/indexing, and delegate to the server the cache algorithms.

With things like Vary, Etags, Conditional fetchs, Content-Encoding, Content-Type, Cache-Control, Expires that the spec barely grasp, adding a huge body is something that a cache server/cdn will not implement.

So again. What is this spec solving?

> If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

The way many caching systems work, by hashing the body and using the hash as the cache key.

Absolutely. I would not recommend using raw search terms as a cache key. Good way to a) leak cache data unintentionally if an attacker were to guess at other cache keys (given the cache keys were not namespaced well), and b) leak user search terms (and users often search for some weird stuff including passwords).
a) Unless you're caching only a single endpoint, which you almost never are, you'd need to have the URL or at least path be a part of the key too, so that solves the "stealing cache from another app/component" (also not having any namespacing is a bad idea regardless, even if using hashes)

b) Unless your cache keys are publicly listable, this is not a security issue. And from a privacy perspective, GET requests are usually cached by path+params, and since search queries are usually in params these days, again, nothing changes.

That's not to say you shouldn't use cryptographic hash functions for keys, just that nothing really changes with this new verb.

I've personally discovered a vulnerability due to a lack of namespacing, where token objects were cached using the token's raw value as the key. There was an API with a /whoami endpoint that returned the current token being used. What the API didn't expect was non-token objects to be read from cache, so if you used authn "Bearer users:1", the /whoami endpoint would respond with the user object of the user with ID 1. Redis is also commonly used for non-caching purposes, e.g. config, so this could've also leaked secrets.

Even if the token cache keys were properly namespaced, any cache key with a "token:" prefix would be readable, even if was used for other purposes than to store a token object. All that would be needed is the key suffix. The remediation of the vulnerability I found included proper cache key namespacing, as well as hashing with an HMAC (since tokens were being stored in plaintext).

So just sharing a real-world scenario where a lack of namespacing (and other caching mistakes) produced a vulnerability.

That's why they're caching the body of requests with a new method.

New and clearly distinct type of requests, new practices.

It isn't just the size of the request that makes people not want to put them in the query string, it's use of the query string over decades.

> If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

I think the horse has bolted here. With HTTP/2 (and often without), URLs can be _very_ long.

> how could you use that as a cache key

Hash

We'd have to change the behavior of browsers & server implementations to do this. Itvs much less risky, much more managable change, to do this with a new more deliberate difference. It'll make it clear that the new behavior is intended.
> We'd have to change the behavior

But you wouldn't for QUERY?

This is backwards compatible and in many cases will just work since GET with body is already syntactically valid.

> But you wouldn't for QUERY?

No, any good HTTP client or server allows unknown/new HTTP methods. If they are not recognized, they are treated as POST. This is a requirement of the HTTP spec.

I've already used QUERY in a few places, and it basically universally just works. Adding a request body to GET would practically be much harder to deploy and depend on.

A few years ago PATCH was added, and back then there was a bit more friction with some HTTP implementations only allowing a fixed set of methods, but this is mostly not true anymore.

> If they are not recognized, they are treated as POST. This is a requirement of the HTTP spec.

That's not true.

"An origin server SHOULD return the status code ... 501 (Not Implemented) if the method is unrecognized or not implemented by the origin server." https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5....

When you use a framework, and add a route that uses QUERY you 'implement' it. My point is that when you do, it will all just work.
I think they mean we’d have to modify existing features. There are probably assumptions made in code and tests around how GET will be used. Instead of breaking those assumptions and potentially implementing new bugs in an old feature, you can use a completely new HTTP method with completely new code paths. Older feature remains unchanged.
QUERY is an explicit negotiating forwards. It would need support, but nothing with existing systems would change. No web page which accidentally tried to send a http bodied GET would start having new behavior.
middleboxes are free to drop the body of a GET request, and many do