Hacker News new | ask | show | jobs
by wongarsu 48 days ago
Back in the day it was reasonably common for CMSs and forums to only have an index.php, and routing entirely by query string (in form-urlencoded form, people were not savages). So you would have index.php?p=home and index.php?p=shop. Or index.php?action=showthread&forum=42&thread=17976. It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons

10 comments

At the risk of naming an Eldritch horror, IIRC it was Cold Fusion that first adopted something like an MVC-in-querystring routing system in the late 90s or early 00s, and that eventually spread when FCGI caught on and users of other languages got used to long-running middleware processes. It seemed hella elegant at the time.
Tangent: Much like PHP, "modern" CF isn't actually that bad to work with these days. In particular the superset-of-html syntax has been superseded for pure logic by "CFScript" which is just an ECMAScript dialect.

There's even a package manager, test harness, etc. And of course it's JVM hosted so it's fairly easy to use Java stuff (stdlib of otherwise) if what you need doesn't exist in CF.

It's funny to read this like it's archaic knowledge, this is my base mental map of how nicer looking URLs work :)
Only when you're using something more or less file system mapped, like Apache

When the "server" is part of the application, you have a richer routing layer, that you can do with what you want

If you're routing like it's 1999, sure, 404.

On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.

There is no reason you can return that "no items matched your selection" with a 404 HTTP response code instead of a 200.
You can return whatever HTTP response code you want, but if you care about knowing whether your site is working being about to look at the logs and see "That user requested a page that doesn't exist" being different to "That user requested a page that exists but had no results" is quite useful. In coding terms it's the difference between a null and an empty array.
You can do that with filtering, which should be a feature of every single logging tools.

Anyway, I agree that when you filter via queries, an empty list is more valid response than 404. That HTTP status should be returned IMHO when the requested (for example by id) item is not found (and of course with wrong paths, etc).

In this case I don't think the status should depend on the number of results. Here are you results, [] is a valid response body when there are no result. Returning 404 if there are no result (GET /books?title=a for instance) is misleading, the caller may think that /books is a non existent route and may conclude that books are reachable via another URI. To me, the querystring has no influence on the response status.

/books/1 could return 200 or 404 depending on the existence of the book#1, here it make sense because if /books/1 does not exist the API must tell it explicitly. However 404 belongs to the 4XX family which means "client error", is it an error to ask for a non existing book ? If you enter in a bookshop and ask for a book they don't have you did not "make a mistake". It's not like if you asked for a chainsaw. But in an API, especially with hypermedia, you are not supposed to request a resource that does not exist (unless the API provides a link to an existing resource that is was deleted before the caller try to reach it).

If you enter a bookshop and you ask for a book that does not exist then it's definitely your mistake.

If you ask for a book they don't have it's a different matter.

In any case, when you ask for a book in a library you are using their "search" endpoint. The equivalent to opening a books/1 url would be asking for a specific instance of a book by serial number or so. Then it's clear that you made a mistake uf you do that for an unexistent serial number...

A response code of 204 seems more appropriate but the problem is you're not allowed to send further information, which would make that descriptive response... not descriptive enough.
Code 204 is just code 200 with the "yes the body really is zero bytes this is not an error it's supposed to be like this" bit set.
I think of it like this:

/users/ returns a 404 in an API means that this resource does not exist. As in, this is not a part of the API.

/users/123 returns a 404 means this user record does not exist.

Yes this means that a 404 is context dependent but in a way that makes it easier for a human to think of and reason about.

Yes, and this is obvious if /users/ exists and returns a 400 if the ID is required. That way you can tell the difference between /users/ being there and expecting and ID, and it not being there.
Of course it is technically possible, but doing so would violate the spec.

> The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

In the above case, the server _is_ returning a representation.

https://datatracker.ietf.org/doc/html/rfc9110#name-404-not-f...

Another reason not to return a 404 in that case is that chances there will be monitor tooling in place that will treat a 404 as an "error" that will show up in your alerting, but would not be ideal; it will just be noise.
The point was that returning a 404 for unexpected query strings doesn’t just happen to okay per the specs, but that there is significant historical precedent for doing so based on application design that was common in the past.

    204 No Content
for nothing found is both not an error (because 2xx code) but also indicates there was nothing found to match the request.

If it's an API, a 200 with an empty JSON object or array in the body is legitimate as well, but a 204 is explicit.

My rule of thumb is that if you want to keep your code clean, always returning an empty collection is preferable to returning an empty response on that branch. You don't need a guard clause to null/undef-check before consuming the result. The rule applies whether we're consuming the response from a repository or an http request.
This too is not spec compliant. 204 means the request was successful but no body is being returned in the response.
Which is the equivalent of nothing found matching the request in a collection.

The alternate is basically 200 OK

followed by a JSON body of:

[]

Yea, empty response at a valid path. Isn’t 204 the code for it?

Lots of REST libraries that I’ve used treat any 400 response as an error so generating a 404 when for an empty list would just create more headaches.

Libraries that automatically throw errors for status codes in the 400 and 500 ranges are pretty obnoxious (looking at you, axios). It adds unnecessary overhead, complexity, and bad ergonomics by hijacking control flow from the app.

Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.

Completely agree on the axios part - one implication of that is you can't statically type the error response shapes (since exceptions can't be typed). Where as with fetch you can have a discriminated union based on the status code (eg: https://github.com/mnahkies/openapi-code-generator/blob/main...)

Although I do feel like I've seen too many instances of a 404 being used for an empty collection where it would make more sense to return `[]` and treat it as an expected (successful) state.

Generally true although 429 is often used for rate limiting so a back off and retry is appropriate. 409, 412, 428 may also be retriable depending on the specific semantics of the given situation. 421 apparently shows up commonly in HTTP/2 connection reuse and is retriable. 423 and 425 too potentially.

It would have been nice if there was an actually grouping of retriable and not retriable but in reality it’s a complete mess.

But at a minimum beware of 429. That’s not a permanent outage and is a frequent one you might get that needs a careful retry.

204 might be acceptable if you aren’t returning an entity body to describe what is missing, but do wish to indicate the request was successful.
I think the author is comfortable creating headaches for people tacking query strings onto URLs
> It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.

> other parts of the stack

As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.

It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

The first layer of any web security should never be checking someone against a list, unless this can be done in less than a few milliseconds. It should only be sanity checking for basic compliance. In the analogy, this first layer should be denying entry to obviously drunk people, zebras, and a stampede of protesters.
>It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.

> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

there is no security benefit to filtering out unneeded url parameters.

> there is no security benefit to filtering out unneeded url parameters.

there is - security in depth.

If a url parameter would've been a vulnerability because something lower down the stack misinterprets it (and the param wasn't necessary for your app in the first place), then you've just left a window open for the exploit.

If the set of url params are known ahead of time (which i claim should be true), then you could make adding unknown params an error.

>If a url parameter would've been a vulnerability because something lower down the stack misinterprets it

By assumption, you are using this url parameter. So you have a bug where you've forgotten to allow this parameter, which will quickly be discovered in your logs and fixed. Then the vulnerability, which you are thus far unaware of, will quickly be exposed. Those url parameters you are not using cannot hurt you.

> there is no security benefit to filtering out unneeded url parameters.

What about passing extra data to fill the server memory with either extra known junk or a script / executable to use with a zero day in an internal component or something.

To misuse the nightclub analogy: it’s like checking for bags not being larger than A4 and disallow knives and other weapons.

No 400 is correct for bad request. As unknown query parameters is clear client error.
All 4xx errors are client errors.

400 is the general “bad request” client area, indicating something is wrong with the request but not being specific about what.

404 is simply a more specific client error: it means the client asked for a resource that couldn’t be found.

That's because Apache is basically what today's JS crowd would call a "file-based router", and then the app implements the actual routing in that index.php file. Just like early SPA stored the route in a hash. It's funny how history repeats itself.

I've gone back and forth on file-based vs programmatic routing. But each has pros and cons, so in the end I implemented both in Mastro: https://mastrojs.github.io/docs/routing/

I believe Wikipedia, and all other mediawiki sites, still do that
watch?v=oHg5SJYRHA0
item?id=48076173
Ooo.. burn.
Oh no, looks like my old forum software urls.
> in form-urlencoded form, people were not savages

Oh yeah? I remember a lot of semicolons from Perl and other CGI stuff where we would now use ampersands, back in the day, both in the path and in the query. (Sometimes the ? itself would be written ;.)

Correct. In fact, the semicolon is part of the URI scheme standard, and the ampersand is just some ad-hoc thing that got adopted naturally without any standardization effort.