Hacker News new | ask | show | jobs
Chrome Returns 206 when the Server Returns 403 (aoli.al)
128 points by aoli-al 471 days ago
7 comments

I have a hard time agreeing with chromium this is reasonable. https://www.rfc-editor.org/rfc/rfc9111.html#name-storing-inc...

> A cache MUST NOT use an incomplete response to answer requests unless the response has been made complete, or the request is partial and specifies a range wholly within the incomplete response.

This behavior as described

1. client requests 1-200; [cached]

2. client requests 1-400; [cache responds with 206 with 1-200]

The cache is able to extend it's cache (which would result in a 403 non-successful). But otherwise MUST NOT use an incomplete cache, to answer a request for a complete answer. Is there room for the cache to pretend it's a server, and behave as if it was just the server? The server is allowed to return 206 however it wants, but unless I missed something, why is the cache allowed to yolo out a 206?

edit: additionally section 3.2 seems to make this even more explicit, this behavior is invalid.

> Caches are required to update a stored response's header fields from another (typically newer) response in several situations; for example.

The ambiguity here is unfortunate because they say required here, but don't use the keyword MUST.

Section 6.3 says

> A message is considered "complete" when all of the octets indicated by its framing are available.

So in your scenario, the first response is complete, and so the caching behavior does not conflict with the spec.

In the scenario I propose, 200 is less than the 400 requested, so it's incomplete. The cache is permitted to retain the smaller request, and return bytes that fall exclusively within, but like I said, I don't think it's free return 200 octets when 400 are requested. If it was why would it make the other statements?

I do think the cache is allowed to retain, and respond for the 200 bytes. I don't think it's free to ignore the header updates, nor do I think it's free to return half the requested bytes in lieu of extending the existing cache.

> 200 is less than the 400 requested,

That's irrelevant. Otherwise, requests for 400 bytes against a resource that is actually only 200 bytes long would never be considered complete and would be disallowed to be cached.

TIL, the HTTP RFC explicitly allows range end to exceed the length of the content:

https://www.rfc-editor.org/rfc/rfc9110#name-byte-ranges

> The Netlog looks scary because it not only contains the traffic while I reproduced the bug but also 1) all traffic from the Chrome plugins and 2) many websites that I have browsed before but haven’t visited during the recording

Isn't that a fantastic use for a 2nd Chrome Profile or even just downloading a Chromium build[1] and using that, showing the behavior in a bleeding edge build?

1: https://download-chromium.appspot.com/

I was also pretty surprised when the OP said "the Chromium team refused to use my server to reproduce the bug", when the actual comments of the ticket were "clone this repo and run my giant node app" and the tester's response was "It seems a bit difficult to set up an build environment to run the static server, could you provide a more minimal repro case?". OP's description of the tester's reasonable concerns seems very unfair.

Even just having a web-accessible endpoint that reproduced the issue would have made the process a lot smoother I think. Apparently in response to OP's request for an easier test case, OP asked for GCP cloud credits(?) to host their server with?. You probably used more bandwidth & CPU loading the new Chromium issue tracker page then you would have just setting up a simple vps to reproduce the issue

No,

(1) I'm not setting up your server to repo the issue. I have no idea what all that code is going to do. Is it going to try to pown my machine?

(2) No, I'm not going to use your server as a repo. I have no idea you aren't updating it every 5 minutes with a new version.

There's a reason developers ask for an MCVE (Minimal complete verifiable example)

https://www.google.com/search?q=MCVE

It's not unreasonable to ask for one. Sorry if that sucks for you because it's difficult for you pair down your code but that's where we are. Try it on the other side and you'll start to get it.

I think this is responding to the wrong comment.
Yea you are the person you are replying to are in complete agreement.
Yeah that's a bad repro. Took me a couple minutes to write a concise one.

server.go:

  package main
  
  import (
      "net/http"
  )
  
  func main() {
      http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
          w.Header().Set("Content-Type", "text/html; charset=utf-8")
          w.Write([]byte(`
          <script>
          (async () => {
              await fetch("/test", { headers: { range: "bytes=0-4" } }).then(resp => console.log('bytes=0-4', resp.status));
              await fetch("/test", { headers: { range: "bytes=0-10" } }).then(resp => console.log('bytes=0-10', resp.status));
          })();
          </script>
          `))
      })
      http.HandleFunc("/test", func(w http.ResponseWriter, r *http.Request) {
          w.Header().Set("Access-Control-Allow-Origin", "*")
          w.Header().Set("Content-Type", "text/plain; charset=utf-8")
          switch r.Header.Get("Range") {
          case "bytes=0-4":
              w.Header().Set("Content-Type", "text/plain; charset=utf-8")
              w.Header().Set("Content-Range", "bytes 0-4/1000000")
              w.Header().Set("Last-Modified", "Mon, 03 Mar 2025 00:00:00 GMT")
              w.Header().Set("Etag", "1234567890")
              w.WriteHeader(http.StatusPartialContent)
              w.Write([]byte("01234"))
          case "bytes=0-10":
              w.WriteHeader(http.StatusForbidden)
              w.Write([]byte("Forbidden"))
          default:
              w.WriteHeader(http.StatusBadRequest)
              w.Write([]byte("Bad Request"))
          }
      })
      http.ListenAndServe(":8080", nil)
  }
`go run server.go` and open up http://localhost:8080 in the browser. For Chrome, in the console one should see

  bytes=0-4 206
  bytes=0-10 206
but if we use "disable cache" this becomes

  bytes=0-4 206
  GET http://localhost:8080/test 403 (Forbidden)
  bytes=0-10 403
In both Safari and Firefox the second request is 403, cache or not.

Now, is this surprising? Yes. Does this violate any spec? I'll take Chromium dev's word [1] and say likely not. Should it be "fixed"? Hard to say, but I agree that "fixing" it could break existing things.

[1] https://issues.chromium.org/issues/390229583#comment16

Chrome's cache is indeed acting correctly. Effectively, it is acting as an intermediary here - your application made a partial content request, and it can satisfy it (partially), so it sends you a 206.

HTTP partial content responses need to be evaluated (like any other response) according to their metadata: servers are not required to send you exactly the ranges you request, so you need to pay attention to Content-Range and process accordingly (potentially issuing more requests).

See: https://httpwg.org/specs/rfc9110.html#status.206

But the Content-Range header and the Content-Length header both indicated the "expected" number of bytes e.g. the number of bytes that would have been returned if the server had given a 206 or a 200, not the truncated number of bytes that the response actually contained. Is that expected?

The latest response from the Chromium team (https://issues.chromium.org/issues/390229583#comment20) seems to take a different approach from your comment, and says that you should think of it as a streaming response where the connection failed partway through, which feels reasonable to me, except for the fact that `await`ing the response doesn't seem to trigger any errors: https://issues.chromium.org/issues/390229583#comment21

Shouldn't the response header returned by Chrome say "4-138724" then though, and not "4-1943507"? The synthesized response body doesn't include bytes "138725-1943507".
Ah - I need to remember to coffee before posting in the AM.

Yes, the mismatch between the response headers and the content is a problem. Unfortunately, IME browsers often do "fix ups" of headers that make them less than reliable, this might be one of them -- it's effectively rewriting the response but failing to update all of the metadata.

The bug summary says "Chrome returns wrong status code while using range header with caches." That's indeed not a bug. I think the most concerning thing here is that the Content-Range header is obviously incorrect, so Chrome should either be updating it or producing a clear error to alert you -- which it looks like the Chrome dev acknowledges when they say "it is probably a bug that there is no AbortError exception on the read".

I might try to add some tests for this to https://cache-tests.fyi/#partial

Looks like the cache intended to produce those bytes, got the 403 and thus was unable to, and interrupted the stream. Just like a lost connection.
This is super weird and needs a bit of editing but it seems like an actual bug. Shouldn’t a 403 invalidate whatever was cached?

As in it should bubble the error up to the user.

I think merging two requests opens up whole can of worms. 200+403 merged translates to 206? There is also content length merging. Wondering what would the rest of the headers translates to. If I respond with a header saying that the stream is EOF in the second call, would that be preserved.
Should it? You can return a partial result for the request, there's no reason it couldn't be a subset of a previous partial request. Why is the browser required to make a network request at all when it can serve a valid (but incomplete) response out of the cache? There's space for argument for what the "best" way to handle this is, but I have a hard time seeing a valid response as "incorrect" or a "bug".

Honestly, this genre of "big tech company refused to fix my very obscure edge case and that confirms all my priors about them" post is getting a little tiresome. There are like three of them coming through the front page every day.

Whether it should or not depends on whether you understand a 403 as a refusal to let you do the given method against the given resource at all, or as a refusal to do this one specific request. The HTTP spec (as I’ve just learned) does support the narrower interpretation if the server wishes it: the description for 403 is just that “[t]he server understood the request, but is refusing to fulfill it”, with no implications regarding other requests for this resource.
Again, it's a range request though. What if the browser simply didn't send a network request at all and just synchronously returned the partial result from the cache. You agree that would be correct (if arguably not very useful), right? The point is that the 403 isn't required to be seen, at all. You can't require the browser return a value that the browser doesn't know about.

It's a cache consistency bug at its root. The value was there, and now it's not. The reporter says "the browser is responsible for cache coherency" (call this the "MESI camp"). The Chrome folks say "the app is responsible for cache coherency" (the "unsnooped incoherent" gang). Neither is wrong. And the problem remains obscure regardless.

I'm the author of the post.

I'm not sure Chrome's current caching behavior is helpful because the second response does not indicate which part of the data is returned. So, the application has no choice but to discard the data.

But thank you for your comments. This helped me to crystalize why I think this is a bug.

Yeah, if there's no way to tell from the request which range has actually been returned that seems like a deal-breaker. The spec’s allowance for a partial response is explicitly motivated by the response being self-describing, and if after Chrome’s creative reinterpretation it is not, then it’s not clear what the client could even do.
> Honestly, this genre of "big tech company refused to fix my very obscure edge case and that confirms all my priors about them" post is getting a little tiresome.

Ahh, let's just wait for the startup to fix it then.

I'm assuming that the OP is using a signed request and the fact that Chrome rewrites the request is what is causing the 403.

I'm interested in what kind of application depends on this behavior - if an application gets partial data from the server, especially one that doesn't match the content-length header, that should always be an error to me.

Some authentication schemes have a short-lived "authorization token", that is valid for like 5 minutes, and a longer-lived "refresh token", which is valid for like a week or two; and return a 403 when the authorization token expires to prompt the client to refresh the token. (See say, supertokens.com .) If your auth token expired in the middle of a multi-part download, it could cause this situation that triggered this bug.
So if you get less HTTP bytes than expected, then it’s a HTTP response error and you throw the whole thing away. For example, this sort of situation happens when streaming HTTP. The server first has to send the response headers, which would be a simple 200/206, then the data, which could have a much more complicated code path. If there is an error in that data code path, all you can do is close the connection and trigger an HTTP error since less bytes were delivered than advertised. Client needs to detect this and retry. While this may seem uncommon, this is well understood behavior for HTTP systems.
Or more likely for a range download, you use the bytes you got and keep making further range requests, to get the whole resource in however many tries it takes. And the 403 would come through as soon as you hit an uncached part of the resource.
Unrelated but I cannot imagine anyone on this site still using Chrome. It's an advertising tool and with the change to remove Ublock and enforce DRM every more aggressively, it has so little to offer. I only use it for casting. Hard to see why anyone would use it for anything else at all.
While true for chrome, I suspect a lot of us (myself included) are using chromium or some derivative.