Hacker News new | ask | show | jobs
by kevinmobrien 4190 days ago
>> "API keys [...] need to be in your committed code"

> No they don't.

A better approach is what the rest of the comments here are suggesting:

(1) store your secrets (API keys, certs, credentials, whatever) in a highly-secure system, with both strong encryption and immutable audit logging around their access and modification;

(2) expose those secrets at run/compile-time via variables, such that the secret is never stored on-disk anywhere other than in the highly-secure system from (1) and in transient storage while in use;

(3) wrap an authz layer around variable access, so that only authorized services/users/hosts (those that have authenticated properly and who are allowed access via the authz policy established here) can read/write/mutate the secrets

It's (basically) the "privileged identity management" space; the challenge is that the commercial software in that market hasn't kept up with the combination of automated ops infrastructure and cloud-hosted dev tools. There are some ideas around how to do 1-3 better, with a devops/cloud-native design built in. (Full disclosure: I'm part of the founding team at a company doing this.)

1 comments

The PIM software I've seen in enterprise (stuff even older than what Cyber-Ark has) has barely even kept up with software from the early 2000s let alone modern automated operations infrastructure. APIs that are written for XML-RPC and even XDR for crying out loud (that implies that even TCP was a tough sell for them). Automating them has been an exercise in incredible pain for few rewards.

Even AWS CloudHSM is not revolutionary conceptually as much as from a compliance and paperwork standpoint. I think there really needs to be emphasis on a (4) - all secrets must be rotated and revokable on-demand and on semi-random schedule. The goal is to make any credential only valid for a period of time less than what an attacker that is already present on your systems would need to further increase presence or to compromise any of 1-4. Who cares if an instance is owned if it's up for maybe 10 minutes and can literally only communicate on a specific port to a specific server with a specific protocol?

Unfortunately, this is all only reasonable in a highly automated architecture and is basically impossible with almost every single company I've ever seen that's ever uttered the mere word ITIL because those companies tend to be people-driven cultures for everything, not process-driven (most companies try to add policies that are so ineffectual and meaningless that everyone reverts back to tribalism similar to how everyone defaults to e-mail when collaboration tooling is ineffective) that you have to figure out to be effective in cloud environments.

I do devops and security automation as well, and there's nothing self-serving about your points if you ask me.

I would definitely love to hear what you think of our stuff. Here's a link; I have chosen a description of how secrets can be stored, distributed over HTTPS, and wrapped with a script that exposes them as environment variables.

http://developer.conjur.net/tutorials/secrets/conjurenv.html

Funny, we just had this just hit the front page against environment variables for secrets. https://news.ycombinator.com/item?id=8826024

It's not clear from the doc you linked that you would support AWS STS, which is probably the right way to approach minimal privilege and to reduce the time window that an attacker would have the privileges of the entity compromised. Wish I had a way to calculate that out from the tools I had which helps drastically during an investigation to sift through network logs.

What you seem to have built so far is what could be used to build a more modern shared secret access stack rather than being a full solution itself. Most companies that want to pay for something want to have something that will rotate out keys & passwords or enforce secrets policies like separation of keys across different nodes in your high availability solution for them (eg. the DB, root, and LDAP cached passwords should not be stored on the same data node even in encrypted form). Otherwise, a lot of companies have built equivalent solutions like Conjur already (to varying degrees of success depending upon how dysfunctional their IT already is). A lot of the custom solutions I'm familiar with in Defense / IC space are starting to use Apache Accumulo to enforce a great deal of sharing and storing of secrets. The architecture of that makes it possible to have tables split both column-wise and row-wise across multiple nodes based upon business rules like HIPAA, FISMA, PCI-DSS, etc. Tack on Zookeeper with some SASL and you'll spend the next year or two just arranging the meetings to figure out the security rules.

For an analogy, it seems like you've built a lot of the workings of Postgres missing something important like procedural queries and triggers, but organizations really want an ORM (they just don't even realize it because the whole industry is built around bikeshedding topics in security). Build something respecting the vernacular and culture of engineers, IT opsec / compliance, and (more importantly) the managers of both orgs and you should have a winner. Ok, after you find the right sales guys to get the attention of some F500s that are in terrible industries wracked by compliance BS 24/7.

All in all, good idea and it looks promising, I'll keep your product in mind if I can get a management tool like this even suggested. We're doing some extremely bad practices at present in order to avoid violating OTHER no-nos keeping stuff out of the public cloud, and our IAM across dozens and dozens of AWS accounts is completely bonkers and the bungling of the credentials as the after-effect is probably causing worse security problems than if we just gave them all the same keypairs. It'd be really interesting to see this work seamlessly across both AWS-like environments and a vSphere/vCAC/vCD type of environment using affinity / anti-affinity rules to make initial guesses about your security configuration. Pretty sure everything in an autoscaling group should be by default in the same group or "layer" (in your terminology), for example, and you could start with the same for vSphere compute clusters, unless host anti-affinity rules for a VM are present, which usually means that the VM is not allowed to cross a physical boundary and is a hint at a business level policy rather than a technical one (nobody does cross-geographic clusters besides Google last I saw, and you probably aren't going to be able to sell this to them....).

One thing that would tremendously help in your documentation would be to provide security scenarios for different user stories and potential users. Admins across multiple tenant business units have different use cases than developers that are working in maybe one or two organizations / groups, for example. I found myself expecting a "I am a... X, Y, Z" set of tabs and wanted to see each of their use case scenarios for one or two sample companies with different needs. Besides the "I don't want to be your guinea pig" mentality, this is what companies are really looking for half the time they ask for a reference customer.