Hacker News new | ask | show | jobs
by Too 1118 days ago
The part where a cache SHOULD have “knowledge of the semantics of the content itself” in combination with “normalization is performed solely for the purpose of generating a cache key; it does not change the request itself” is the scary part.

It may sound cool and efficient on paper, just trim the whitespace and sort all json dictionaries right? But in practice it adds too much complexity, eventually implementations of this semantics will start to drift between cache and real backend. Case in point: SAML XML signatures.

This is how one creates a cache poisoning vulnerability. If a request is normalized as a cache key, use the normalized request when sending to the backend also. If you don’t trust that process you shouldn’t trust it as the cache key either.

Proxies should be dumb, just hash the raw string for the cache key.

1 comments

This. Plus it is a good idea to specify the minimal recommended hash algorithm to have some manageable expectations on collisions. "The cache key collision rate is guaranteed to be not worse than SHA-256".