Hacker News new | ask | show | jobs
by cientifico 1113 days ago
Caching the body scares the hell out of me.

If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

Right now you can:

* Pass the arguments as parameters

* Pass them on the request body. I personally done it on apis for games in unity/ios/android for almost a decade). Other products like Elasticsearch count on that as part of the core product.

* Semantically create searches in the server with POST /search

In the previous two examples, you can return a redirect to the search results (like /searches/33) with perfect caching/indexing, and delegate to the server the cache algorithms.

With things like Vary, Etags, Conditional fetchs, Content-Encoding, Content-Type, Cache-Control, Expires that the spec barely grasp, adding a huge body is something that a cache server/cdn will not implement.

So again. What is this spec solving?

4 comments

> If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

The way many caching systems work, by hashing the body and using the hash as the cache key.

Absolutely. I would not recommend using raw search terms as a cache key. Good way to a) leak cache data unintentionally if an attacker were to guess at other cache keys (given the cache keys were not namespaced well), and b) leak user search terms (and users often search for some weird stuff including passwords).
a) Unless you're caching only a single endpoint, which you almost never are, you'd need to have the URL or at least path be a part of the key too, so that solves the "stealing cache from another app/component" (also not having any namespacing is a bad idea regardless, even if using hashes)

b) Unless your cache keys are publicly listable, this is not a security issue. And from a privacy perspective, GET requests are usually cached by path+params, and since search queries are usually in params these days, again, nothing changes.

That's not to say you shouldn't use cryptographic hash functions for keys, just that nothing really changes with this new verb.

I've personally discovered a vulnerability due to a lack of namespacing, where token objects were cached using the token's raw value as the key. There was an API with a /whoami endpoint that returned the current token being used. What the API didn't expect was non-token objects to be read from cache, so if you used authn "Bearer users:1", the /whoami endpoint would respond with the user object of the user with ID 1. Redis is also commonly used for non-caching purposes, e.g. config, so this could've also leaked secrets.

Even if the token cache keys were properly namespaced, any cache key with a "token:" prefix would be readable, even if was used for other purposes than to store a token object. All that would be needed is the key suffix. The remediation of the vulnerability I found included proper cache key namespacing, as well as hashing with an HMAC (since tokens were being stored in plaintext).

So just sharing a real-world scenario where a lack of namespacing (and other caching mistakes) produced a vulnerability.

That's why they're caching the body of requests with a new method.

New and clearly distinct type of requests, new practices.

It isn't just the size of the request that makes people not want to put them in the query string, it's use of the query string over decades.

> If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

I think the horse has bolted here. With HTTP/2 (and often without), URLs can be _very_ long.

> how could you use that as a cache key

Hash