Hacker News new | ask | show | jobs
by fabian2k 1586 days ago
This is probably a pretty stupid question, or at least based on some misconception of mine about this space. But I don't really understand how permissions as a service or API can work efficiently.

If I request a single resource, of course this can work if I ask a second API on whether the request is allowed or not. But if I query a database for a list of items, to add access control I need to modify the database query. I can't just filter after the fact, it's too easy to cause pathological performance issues there e.g. if the user has only access to a very small subset of a large list of results. How does this work with a separate access control API that can't directly modify the database query?

6 comments

Disclaimer, I am a founder of Cerbos. At Cerbos, rather than writing policies in Rego, you can write them in a much simpler YAML/JSON (much more like AWS IAM)

(a bit late to the party)

Hi Fabian, At Cerbos we had to handle this issue as well and wrote a blog post about [1] how we can convert a policy into a generic AST that you can use in your data filtering logic on your data storage. This way you can empower your data storage queries to only fetch the relevant records.

To showcase how this works, we have released a Prisma ORM plugin [2] that converts our AST to Prisma filters - you can see a demo on Prisma’s YouTube channel[3]

[1]: https://cerbos.dev/blog/filtering-data-using-authorization-l...

[2]: https://cerbos.dev/blog/fully-featured-authorization-for-you...

[3]: https://youtu.be/lqiGj02WVqo?t=3616

Disclaimer: I am a founder of Authzed (W21).

Generally, this problem is called ACL-Filtering[0][1] and can be done in two ways: "pre-filter" and "post-filter". Sometimes you might even have to do both.

If you decide to use a service/database for permissions, similar to SpiceDB[2], there are often specialized APIs for directly listing the entities a subject has access to in various ways. You can take these results and feed them into a database query to select only the authorized content. This doesn't have to just be a list of IDs, but can also be datastructures like bitmaps, effectively providing your database with a custom index for your query. Systems that implement some of the novel parts of the Zanzibar paper[3] can also enable you to cache these values in your database until your application performs an operation that invalidates the results.

Filtering once you've queried all possible results from your database can also be more performant than you'd think, because you can amortize performance by lazy loading and performing permission checks in parallel. We have some pretty large systems that are purely using this strategy. The code for filtering can also be made extremely elegant because it can be hidden behind the iterator interface in whatever programming language you're using.

[0]: https://docs.authzed.com/reference/glossary#acl-filtering

[1]: https://authzed.com/blog/acl-filtering-in-authzed/

[2]: https://github.com/authzed/spicedb

[3]: https://authzed.com/blog/what-is-zanzibar/

So the idea is that you create a candidate set of resource keys from the permission system and join that with the external database and / or use it as a post filter?
@jzelinskie care to respond, I am really interested in the answer?
You're correct. The only thing I'd add is that post filters can also be done without a candidate set of resources by performing individual permission checks for each potential resource. This is slower, but, as I mentioned, it can actually be perform better than you'd think with some tricks.

Apologies for the delayed response.

We've passed on most of the google-pointing technologies in this thread precisely because of that architectural footgun

Instead, we went with Casbin (microsoft research) because it can push to your DB (including multi-tenant-sharing if you scale) and a legit modern policy engine - A(R)BAC, ACL, etc. Definitely warts, but pretty close to what I'd hope architecturally, meaning a clear path to prettier UIs, plugging into automatic SMT solvers/verifiers, etc, and till then, pretty easy from whatever backend lang + SQL DB you use.

Long-term, stuff like row-level security in your SQL / metadata store makes a lot of sense (people pointing that out in the thread below), but RLS is still awkward in practice for even basic enterprise RBAC/ACL policies. Until then, Casbin-style architectures are the equivalent of a flexible external policy decision point with the actual compute still being pushdown to wherever you want, including the DB: win/win.

I wish the VC money went this way instead, but I see why $ goes to simpler "google for everyone else" pitches, so here we are :(

You bring up a good point with respect to the "Google for everyone else" technologies. The fact is that very few organizations are the size of Google (or have the SRE team / expertise that Google does). Zanzibar works at Google because they have a geo-scale private fiber investment, and an SRE team that can operate many global instances. The cache consistency elements of Zanzibar are the hard problem here.

We chose a more pragmatic approach with Aserto. We believe that most authorization problems can be expressed as a combination of rules and data. A system that is 100% rules or 100% data isn't pragmatic.

It's actually a pretty great question!

As others have mentioned, authorization often requires both a single method to authorize "can the user perform this action on this resource" as well as more flexible versions like "what are all the resources this user can perform this action on". That's one part of why authorization is hard, I wrote an article on this a little while ago [1]. At Oso (disclaimer: I'm the CTO), we solve this by turning authorization logic into SQL [2].

Supporting those APIs in a generic service definitely turns up the difficulty level -- you no longer have a single database to query.

The thing is, if you have multiple services, you might already be in that situation. If I need to query another service to ,e.g., find what projects a user belongs to, and then need to go combine that with data in my database, I'm going to need to start worrying about how to do that efficiently. In those situations starting to centralize that data + logic starts making sense -- we talk about this in [3]. So now there's a bunch of companies with different takes on how to best solve this, including Oso.

---

I feel bad about self-promoting so many links here... but we're passionate about this subject so we've been writing a lot about it!

[1]: https://www.osohq.com/post/why-authorization-is-hard

[2]: https://www.osohq.com/post/authorization-logic-into-sql

[3]: https://www.osohq.com/post/microservices-authorization-patte...

Hey samjs, we've been using Oso at Source.ag for a month or so, and we're really happy with it! Precisely the fact that you solve authorization on a resource level and implement filtering on the DB level, makes it super useful!

The biggest gripe we have, is lack of support for SQLAlchemy 2.0 style queries and lack of support for DB & Python enums as role names

We had a chat with Graham who told us about your upcoming cloud offering. Looking forward to that!

Thank you! We're looking forward to sharing more about Oso cloud too :)
Welp, you know you're solving a hard problem when two other founders drop links in your HN thread :)

More seriously, I agree that there are a number of challenges, and different use-cases tend to require different approaches. Over time, we think there will be a set of common patterns that emerge, which will help the industry move towards a more consistent set of authorization experiences. And that will be great for everyone.

Great question. There are two scenarios that are relevant for authorization:

1. Gating the operation (whether it's retrieving a single resource, or creating / updating / deleting a resource). In this scenario, the application does need to call the authorizer before performing the operation on the resource, but the relationship between the authorizer and the application is at "arms length".

2. Filtering a list of resources. In this scenario, the authorization system can help you by running what in OPA is known as a "partial evaluation", which returns an abstract syntax tree. Your code would then walk that tree and insert the proper where clauses (if you're talking to a SQL back-end). As you mentioned, by necessity, your application needs to work more closely with the authorizer.

If you are using Postgres (or YugabyteDB) then you can use row level security for that.
Authorization is really about "defense in depth". In a ZTA model, your access proxy, authentication system, API gateway, application middleware, and data layer all provide additional levels of protection [0].

Using your DB's row-level security for data filtering is definitely complementary to API authorization.

[0] https://www.aserto.com/blog/modern-authorization-requires-de...

Yup, and I'd use Keycloak with a PostgreSQL extension to drive both together. I'd drive RLS from Keycloak if I was going to go that way.