Hacker News new | ask | show | jobs
by that_james 1433 days ago
I very intentionally avoided the REST vs HTTP RPC debate.

This is _specifically_ about HTTP APIs. REST is not a synonym for HTTP but there are much better resources out there that rant on about the important of hypertext and URL support etc.

This is mostly about the perspective of consumers.

>It does not matter too much which part of the URL is the one that is causing the URL to be wrong.

Your consumer disagrees, they'd like to know if their URL was fat fingered or if a record was missing. My argument is 404 is inappropriate because the web service exists, but the record doesn't.

> Thus `api/v11/employees/1` and `/api/v1/employees/100` both are wrong resources.

I can't say this is wrong, but it doesn't feel right. `/api/v11` straight up resolves to nowhere. Maybe this an instance where Gone is better than Not Found?

> Nobody is stoppping anyone to add response body to a 404 to indicate which nested resource is invalid. That can be added as a debug message for the developer for example.

100% agreed. It's just a thought I had while debating with my team.

I posted this here for this exact kind of feedback :D good points raised

6 comments

(I think debating is good and healthy, so let me do a short rebuttal about the difference between `/api/v11/employees/1` and `/api/v1/employees/100` with an example specifically because you say that you are talking about _HTTP_ APIs.

So I will focus on HTTP then.

Say you install an nginx/apache and then have a static structure where you have the profiles of employees saved as PDFs directly on the disk.

Then you do `GET /public/v1/employees/1.pdf` and it works returns 200 with the content. Then you do `GET /public/v11/employees/1.pdf` in this case all servers will return 404. The same goes for `GET /public/v1/employees/100.pdf` will still be 404. What if someone asks for `GET /public/v1/employeea/1.pdf` the server will again respond with 404.

Then I go and I implement an webapp to replace that. I plan to keep the URLs the same but now there is an app that will return the .PDF as a datastream or file.

For me, I don't see any reason why to change the behaviour of the URLs just because I replaced a static app with a dynamic web app. Any HTTP server will respond in the same way thus the current one should respond the same.

Responding like this has a compatibility (let's say) reason behind that is not a personal nor related specifically to my project.

Honestly I’m amazed that you managed to find a closer for this argument. And it works. I was _firmly_ of the “204 instead of 404” camp, but I find the “swap between static and dynamic serving” quite compelling.

It’s worth being redundantly explicit that this does not extend to all cases. There are cases where a 204 is warranted. But I’m roughly convinced that it may not be as ubiquitous as I thought. Very rad.

>Say you install an nginx/apache and then have a static structure where you have the profiles of employees saved as PDFs directly on the disk.

>Then you do `GET /public/v1/employees/1.pdf` and it works returns 200 with the content. Then you do `GET /public/v11/employees/1.pdf` in this case all servers will return 404.

Which is a sensible default because the most nginx/apache can conclude is that the resource does not exist on the server. However, if we know that this server is the canonical record for these pdfs we can conclude that it doesn't exist if it's not on the server. So now we know the state (it doesn't exist) and can return its representation.

Ok, I see now what is the point that we disagree:

What does 200 means with regards to existence/non-existence.

I think 200 means something exists and 404 is the representation of non-existence.

You think (please correct me if I am wrong) that the existence or non-existence information should be in the body.

Actually I think the underlying (and more important point) is about how valuable the existence/non-existence information is for the client => how quick the client should have feedback about this?

I think existence is a very important information and thus should be a first class citizen of the data representation. Thus I want it in the status code on the same level with the body itself.

If you put it in the body then that means for me an extra step to parse the body and then see what is there. So then the existence/non-existence is on the second level.

In your case with responding with 200 + body then the 200 status becomes irrelevant and I always need to parse the body => time is lost to access the _first_ important information that should then guard my business logic to parse or not the body.

While in my case (using 200 and 404 status) the client receiving 404 knows directly (without any parsing of the body) that the request was not successful in retrieving existing data.

>I think 200 means something exists and 404 is the representation of non-existence.

But that's not what the spec says:

>200 OK - The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:

> GET an entity corresponding to the requested resource is sent in the response;

>The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

404 is not the representation of non-existence. It's the representation of not found. Something can be "not found" for many more reasons than non-existence. Which ultimately causes the person integrating your API much consternation because they have no idea if it's a "Everything worked" 404 or "My DNS is borked" 404 or "Your server's routing is borked" 404 or half a dozen other possibilities. Sure, you might add further information to your 404 response but that means you can't have generic 404=bad monitoring. Plus causes headaches for people that are working in systems that do assume 404=bad.

200 means that the request has succeeded. And in these cases it has. You requested a representation of employee 100 and you're getting one (it doesn't exist).

Even if you disagree with the word smithing the latter is far far easier to work with.

If the people who designed the web didn't want information about the application code to show up as a status code, we wouldn't have status 500.

Originally, anything with a path was meant to simulate a directory tree of static files. We build it dynamically because that's easier to maintain. But making it look and act the same by returning 404s is historically correct.

Of course things evolve and move on. You're free to do as you wish. But to me you're making a bizarrely arbitrary distinction about what part of the application is allowed to return a 404 (routing code in the framework) and what aren't (your own code). Or did you not realize that a framework like Django isn't actually part of the webserver?

This is a bit weird to me.

Your article is entitled "I've been abusing HTTP status codes" ... but... you're not "abusing" them, you're "not using" them for your APIs. (Or, said another way, you're leaving them to their normal usage for HTTP servers.)

Thus -- as REST is /the/ canonical "hijack HTTP status codes to mean something clever" paradigm -- your article is /entirely/ in context of REST even if you avoided mentioning it.

...

Anyway - I'm entirely with you on the foolishness of using 404 to mean both "your URL is messed up" and "I couldn't find the resource you wanted".

> Thus -- as REST is /the/ canonical "hijack HTTP status codes to mean something clever" paradigm

It's doubly not. The REST Architectural style is (1) protocol neutral, rather than specific to HTTP and (2) emphasizes using the underlying protocol, whatever “as is”.

> Anyway - I'm entirely with you on the foolishness of using 404 to mean both "your URL is messed up" and "I couldn't find the resource you wanted".

But those are literally the same thing. A URL/URI is a “Uniform Resource Locator/Identifier”.

“I don't have a matching resource” is a 404 (unless you are distinguishing “I had a matching resource but you missed it and it's not coming back”, which is 410.) While you might use a body message to distinguish “I would never expect to have a resource with that shape URL” from “I have resources with URLs shaped like that, but not that particular URL”, both are within the usual, RFC-defined meaning of the 404 status code.

Your argument is obviously what has been normalized in REST APIs, but it's not user friendly AND it's OP's whole point. He built his entire article -- and apparently his APIs -- around avoiding 404 ambiguity.

If you hit an endpoint and get a 404... did you do it wrong? Is my documentation outdated?

Even better: What recourse do you have? how do you figure out the answer?

Your only recourse is to email me. Send me cURL commands and screenshots and sit on your thumbs until I write back.

IMHO the REST folk were blinded by the existing 404 normalcy set up by web servers.

...

personally I think OP's idea isn't great. I think returning 200 and making me parse the response and hoping it's consistent between services is too much work compared to the simplicity of the HTTP response.

Instead, I'd change the default from 404 to 501.

    HTTP 501 - not implemented (URI is not working)
    HTTP 404 - resource not found (Joe doesn't exist in db)
501 is for unrecognized methods. As I was about to say this would then be incorrect usage, it occurred to me that you could in fact use this for an RPC system if the procedure name were used as the HTTP Method.

So instead of

    GET /api/v1/employee/100
    Accept: application/json
You sent

    GetEmployee /1000
    X-API-VERSION: 1
    Accept: application/json
Then if the actual procedure name was "get_employees", the correct response to this request would be 501, and /1000 referring to a non-existent employee would be a 404.

If making an RPC and restricting yourself to the known HTTP methods, the closest is

    GET /api/v1/employee?id=100
    Accept: application/json
which would return 404 only if the controller endpoint didn't exist, and would return whatever the application wanted if userid=100 didn't exist, such as 422 or 200 with a response indicating non-existence. It would be just like a local procedure call that could return a false value, or throw an exception instead of returning a value.
> If you hit an endpoint and get a 404... did you do it wrong? Is my documentation outdated?

Sure, you might want information in addition to that provided by the status code. And, again, rather than reinventing the wheel with some ad hoc mechanism, you can follow the HTTP spec for a solution: almost all HTTP status codes support a response body to communicate additional detail.

> Instead, I'd change the default from 404 to 501.

5xx errors indicate server problems, not request problems. If you wanted a different status code for “that path isn't structured in a way I understand” vs “I understand how I would look up something with that path but can't find it“, 400 or 421 for the former would be better than anything that is not a 4xx since they each (1) are in the correct class and reports a client error, and (2) have a definition which arguably fits the scenario, even if 404 arguably fits better.

I recalled there was a "not found" response and used it whilst spitballing the response above, but you're absolutely correct. 400 (Bad Request) and 421 (Misdirected Request), or even 409 (Conflict) -- as another poster mentioned -- would be great responses in that scenario

The main "issue" is that 404 is the normalized response for web servers when an endpoint doesn't exist. So it feels like one is breaking the established paradigm by using something else, but I think it's absolutely worth doing.

Certainly a bigger fan of defaulting to returning a 409 than making my API consumers parse all my response bodies.

> Thus -- as REST is /the/ canonical "hijack HTTP status codes to mean something clever" paradigm -- your article is /entirely/ in context of REST.

Oof, that's a hell of a good point. So much for that plan lol

> Anyway - I'm entirely with you on the foolishness of using 404 to mean both "your URL is messed up" and "I couldn't find the resource you wanted". Seems like, for REST, you'd want to return a 400 (malformed request) or something if your URL was borked rather than overloading 404.

Yup, that's the headache I'm trying to muddle my way through.

Really it's less "this is how to build APIs" and more "have you considered your consumer when you return data?". But I think even in that context your point stands better.

Back to the drawing board it seems.

At least I can generate more content now :D

If you go down this path (pun not intended), consider the structure of a URL to have semantic content, and yet you want to have HTTP compliant yet meaningful errors: another common scenario would be 409 conflict. This is the HTTP way to say it's not possible to process this request right now, while not suggesting it will never be possible.

This is most appropriate when the URL does not make sense with the current state of the server, but other future operations could (in theory) change the server state such that it does make sense. This might make sense if you have a user-extensible data model, where some kind of relationship is being mapped into the path structure and you want to signal that this relationship is not currently known to the query system, but _could be_ in the future.

Now, you are faced with a decision for when this ephemeral status is a semantic conflict, when it is simply a resource not found, or when it is a forbidden request for the current user.

The last is subtle and depends on other security posture. Do you want to tell your user "this is possible/available for a sufficiently privileged user, just not for you" or do you want to avoid leaking information about higher privilege roles? This is similar to the debate about whether a login UX should tell you whether you have an invalid user id versus password or just say login failed without leaking more information to a potential attacker.

> Your consumer disagrees, they'd like to know if their URL was fat fingered or if a record was missing.

Why?

How often does this really come up?

Who is typing in URLs like this manually?

If you're typo'ing it in code, are you not doing any kind of validation/testing against any kind of spec that can catch this?

Why is it up to the actual webservice returning a 404 to catch these kind of typo errors?

And I'm not saying I disagree with the argument -- I fully get the argument that was made, but practically the fact that you're caring about it suggests you're missing other components in your stack. You're producing a URL request which is outside of the spec of legal URLs for the webservice. You can validate that before you ever make a real web request against a real server.

What does this api do? Get? Update? Remove?

You can’t ignore the http verbs, so your article doesn’t make sens. You also shouldn’t ignore status codes.

You’ll also might get in trouble with caching.

You can easily use status codes, and provide all the detail + status field in the response. It makes consuming a lot easier

> Maybe this an instance where Gone is better than Not Found?

Gone means a resource identified by a URI existed, it no longer exists, and that resource (not the URI, necessarily) will never be available again.