Hacker News new | ask | show | jobs
by worksonmine 854 days ago
> What value would there be in preventing guessing?

It prevents enumeration, which may or may not be a problem depending on the data. If you want to build a database of user profiles it's much easier with incremental IDs than UUID.

It is at least a data leak but can be a security issue. Imagine a server doing wrong password correctly returning "invalid username OR password" to prevent enumeration. If you can still crawl all IDs and figure out if someone has an account that way it helps filter out what username and password combinations to try from previous leaks.

Hackers are creative and security is never about any single protection.

1 comments

> If you can still crawl all IDs and figure out if someone has an account that way it helps filter out what username and password combinations to try from previous leaks.

Right, but like I suggested above, if you're able to get any response other than a 404 for an ID other than one you're authorized to access, then that in and of itself is a severe issue. So is being able to log in with that ID instead of an actual username.

Hackers are indeed creative, but they ain't wizards. There are countless other things that would need to go horribly horribly wrong for an autoincrementing ID to be useful in an attack, and the lack of autoincrementing IDs doesn't really do much in practice to hinder an attacker once those things have gone horribly, horribly wrong.

I can think of maybe one exception to this, and that's with e-commerce sites providing guest users with URLs to their order/shipping information after checkout. Even this is straightforward to mitigate (e.g. by generating a random token for each order and requiring it as a URL parameter), and is entirely inapplicable to something like GitLab.

> Right, but like I suggested above, if you're able to get any response other than a 404 for an ID other than one you're authorized to access, then that in and of itself is a severe issue. So is being able to log in with that ID instead of an actual username.

You're missing the point and you're not thinking like a hacker yet. It's not about the ID itself or even private profiles, but the fact that you can build a database of all users with a simple loop. For example your profile here is '/user?id=yellowapple' not '/user?id=1337'.

If it was the latter I could build a list of usernames by testing all IDs. Then I would cross-reference those usernames to previous leaks to know what passwords to test. And hacking an account is not the only use of such an exploit, just extracting all items from a competitors database is enough in some cases. It all depends on the type data and what business value it has. Sometimes an incrementing ID is perfectly fine, but it's more difficult to shard across services so I usually default to UUID anyway except when I really want an incrementing ID.

Most of the time things don't have to go "horribly horribly wrong" to get exploited. It's more common to be many simple unimportant holes cleverly combined.

The username can still always be checked for existence on the sign-up step, and there aren't many ways of protecting from that. But it's easier to rate-limit sign-ups (as one should anyway) than viewing public profiles.

Do you leave your windows open when you leave from home just because the burglar can kick the front door in instead? It's the same principle.

> Do you leave your windows open when you leave from home just because the burglar can kick the front door in instead?

Yes (or at least: I ain't terribly worried if I do forget to close the windows before leaving), because the likelihood of a burglar climbing up multiple floors to target my apartment's windows is far lower than the likelihood of the burglar breaking through my front door.

But I digress...

> You're missing the point and you're not thinking like a hacker yet.

I'd say the same about you. To a hacker who's sufficiently motivated to go through all those steps you describe, a lack of publicly-known autoincremented IDs is hardly an obstacle. It might deter the average script kiddie or botnet, but only for as long as they're unwilling to rewrite their scripts to iterate over alphanumerics instead of just numerics, and they ain't the ones performing attacks like you describe in the first place.

> For example your profile here is '/user?id=yellowapple' not '/user?id=1337'.

In either case that's public info, and it's straightforward to build a list of at least some subset of Hacker News users by other means (e.g. scraping comments/posts - which, mind you, are sequential IDs AFAICT). Yes, it's slightly more difficult than looping over user IDs directly, but not by a significant enough factor to be a worthwhile security mitigation, even in depth.

Unless someone like @dang feels inclined to correct me on this, I'm reasonably sure the decision to use usernames instead of internal IDs in HN profile URLs has very little to do with this particular "mitigation" and everything to do with convenience (and I wouldn't be all that surprised if the username is the primary key in the first place). '/user?id=worksonmine' is much easier to remember than '/user?id=8675309', be it for HN or for any other social network / forum / etc.

> Sometimes an incrementing ID is perfectly fine, but it's more difficult to shard across services so I usually default to UUID anyway except when I really want an incrementing ID.

Sharding is indeed a much more valid reason to refrain from autoincrementing IDs, publicly-known or otherwise.

> multiple floors

But not if you lived on the ground floor I assume?

> iterate over alphanumerics instead of just numerics

Why would you suggest another sequential ID as a solution to a sequential ID? I didn't, UUIDs have decent entropy and are not viable to brute force. Don't bastardize my point just to make a response.

> I'm reasonably sure the decision to use usernames instead of internal IDs in HN profile URLs has very little to do with this particular "mitigation" and everything to do with convenience

It wasn't intended as an audit of HN, I was holding your hand when walking you through an example and chose the site we're on. I missed my mark and I don't think a third attempt will do much difference when you're trying so hard not to get it. If you someday have public data you don't want a competitor to enumerate you'll know the solution exists.

> But not if you lived on the ground floor I assume?

Still wouldn't make much of a difference when my windows are trivial to break into even when closed.

> Why would you suggest another sequential ID

I didn't.

> UUIDs have decent entropy and are not viable to brute force.

They're 128 bits (for UUIDv4 specifically; other varieties in common use are less random). That's a pretty good amount of entropy, but far from insurmountable.

And you don't even need to brute-force anything; if these IDs are public, then they're almost certainly being used elsewhere, wherein they can be captured. There's also nothing stopping an attacker from probing IDs ahead of time, prior to an exploit being found. The mitigations to these issues are applicable no matter the format or size of the IDs in question.

And this is assuming the attacker cares about the specific case of attacking all users. If the attacker is targeting a specific user, or just wants to attack the first user one finds, then none of that entropy matters in the slightest.

Put simply: there are enough holes in the "random IDs are a meaningful security measure" argument for it to work well as a strainer for my pasta.

> when you're trying so hard not to get it

Now ain't that the pot calling the kettle black.

> If you someday have public data you don't want a competitor to enumerate you'll know the solution exists.

If I don't want a competitor to enumerate it then I wouldn't make the data public in the first place. Kinda hard to enumerate things when the only result of attempting to access them is a 404.

> I didn't

Then how would you iterate it? You said:

>> iterate over alphanumerics instead of just numerics

I assume you mean kind of like youtube IDs where 123xyz is converted to numerics. I wouldn't call brute-forcing iterating, at least not in the sense we're discussing here.

> They're 128 bits (for UUIDv4 specifically; other varieties in common use are less random). That's a pretty good amount of entropy, but far from insurmountable.

If you generate a billion UUIDs every second for 100 years you have a 50% chance to have 1 (one!) collision. It's absolutely useless to try to guess even a small subset.

> And you don't even need to brute-force anything; if these IDs are public, then they're almost certainly being used elsewhere, wherein they can be captured.

Sessions can be hijacked anyway, so why not leave the user session db exposed to the web unprotected. Right? There will always be holes left when you fix one. That doesn't mean you should just give up and make it too easy.

> And this is assuming the attacker cares about the specific case of attacking all users. If the attacker is targeting a specific user, or just wants to attack the first user one finds, then none of that entropy matters in the slightest.

So your suggestion is to leave them all incrementing instead, do I understand you correctly?

> Put simply: there are enough holes in the "random IDs are a meaningful security measure" argument for it to work well as a strainer for my pasta.

It's such a simple thing to solve though that it doesn't really matter.

> If I don't want a competitor to enumerate it then I wouldn't make the data public in the first place. Kinda hard to enumerate things when the only result of attempting to access them is a 404.

There are lots of things you may want to have public but not easily crawlable. There might not even be the concept of users and restrictions. A product returning 404 for everything to everyone might not be very useful to anyone. You've been given plenty of examples in other comments, and I am sure you understand the points.

One example could be recipes, and you want to protect your work while giving real visitors (not users, there are no users to hack) the ability to search by name or ingredients. With incremental IDs you can scrape them all no problem and steal all that work. With a UUID you have to guess either the UUID for all, or guess every possible search term.

Another could be a chat app, and you don't want to expose the total number of users and messages sent. If the only way to start a chat is knowing a public key and all messages have a UUID as PK how would you enumerate this? With incrementing IDs you know you are user 1337 and you just sent message 1,000,000. With random IDs this is impossible to know. Anyone should still be able to add any user if the public key is known, so 404 is no solution.

I'm sure you'll have something to say about that as well. The point is to make it difficult to abuse, while still being useful to real visitors. I don't even understand your aversion to random IDs, are they so difficult to implement? What's the real problem?