Hacker News new | ask | show | jobs
by babbledabbler 1094 days ago
I think you raise some good questions around IDs and PII and we definitely will be tackling GDPR sooner or later.

I don't quite follow on the European regulation issues raised by using as a UUID in a route and that being the PK of the record.

I know you should not expose PII or any information that can be used to identify a person, however, in our case any route is behind an authed login on an SSL connection which encrypts the path (we don't use query params).

The only place that contains data that ties a UUID to a person is in the database. This would be the case whether we used a PK as an integer or not.

Could you elaborate or share any resources around dual IDs DB design for PII compliance? That would be super helpful.

Regarding framework hacking or workarounds, I have a principle to not go against the grain of a framework. The reason for this is that modifying/hacking adds complexity when building on top of it or onboarding other software engineers. If necessary I'll do it as a last resort.

1 comments

> any route is behind an authed login on an SSL connection which encrypts the path

If your application only services a single user with their own resources then you have nothing to fear. But few applications meet this definition. If, for example, you're running an invoicing application, then at some point you'll want to share some resource, say an invoice or an expense or a time sheet, with another party. If your API exposes the identifiers from one resource to another, or even a user's id when potentially adding them to a team, then these identifiers are considered PII according to European regulators.

I understand that this is frustrating, but it comes from a posture that prioritizes right-to-be-forgotten over programmer ergonomics. Imagine, for example, API crawlers that hit your /search endpoint with email=[some predetermined list of emails] and harvest user ids to match with future data.

In the end, the best thing you can do is keep join keys internal and API keys separated. There are other workarounds, but they're so much trouble that they aren't really viable alternatives. Now, whether you use UUIDs for both identifiers or UUID for external and integer ids for join keys is up to you and your performance and scaling requirements. Personally, I prefer integer keys for internal unless I really expect the database to grow to more than 200m rows before the company hits 1000 people, since int ids mean you do not need secondary indexes on things like the created_at fields, but even there, it's not such a big deal to have an extra index on every table.

> I have a principle to not go against the grain of a framework. > hacking adds complexity

Here we essentially agree, but with the right integration tests, upgrading and onboarding is a lot easier than feared. That said, do not add to the framework unless the benefit is worth it.

I don't really find these considerations frustrating just a bit tricky but regardless definitely agree with GDPR and on board with keeping PII secure from the get go.

I'm still having a little trouble grokking when an ID becomes exposed or shared so I guess I'll just have to read up on this as it's certainly important.

In our system I realized user IDs are not shared nor linked to (at least not yet) so in actuality the case where there's a URL with a UUID representing a person does not occur. Content generated does not reference UUIDs for persons either. There are URLs with UUIDs representing other types of resources.

By API key I take it to mean an access key for an external reference. That's a good idea for replacing the PK integer with a PK UUID but keeping an external UUID field. That would satisfy the concern with maintaining integer sequences and migrating data.

Anyway this has been helpful so thank you for sharing your thoughts and I have some things to look go on to stay in the good graces of European regulators.