Hacker News new | ask | show | jobs
by tompic823 1965 days ago
This solves a very real problem that some services like GitHub [0] have started to address. Auth tokens are being committed to public repos at an alarming rate. Detecting this and ideally preventing it as early as possible is key to avoiding account compromise. There are two components to this: identification of a secret and attribution. Identification is non-trivial and requires determining if some text really is a secret and not just a random hash, uuid, or other high entropy string. Most tokens today are generic, alphanumeric patterns; false positives abound. Attribution is tricky too, currently relying on either parsing the variable name (`AWS_SECRET_KEY=XYZ`), commit message, file name, or some other metadata. In the rare case, a service will have designed their auth tokens with this in mind, prepending a unique, static prefix to their tokens.

The URI scheme proposed in the linked RFC will squarely solve the first problem. It will allow for highly accurate CI scanners and pre-commit hooks. The scheme doesn't appear to address attribution, assuming all service providers use the same `secret-token` scheme. However attribution is a nice-to-have, allowing for automated revocation once the secret has gone public. If done right, identification alone could be used to prevent most of the token leakage that occurs today.

[0] https://docs.github.com/en/developers/overview/secret-scanni...

3 comments

I hope it works out. What I do, is create a file, usually called something like "DoNotCheckThisIntoSourceControl.swift", and then put that in a directory called "DoNotCheckThisIntoSourceControl". I then add "DoNotCheckThisIntoSourceControl" to my .gitignore.

Clunky, but it works. I add things like server secrets and whatnot, there. I keep the file small, and usually add the contents to a secure note in 1Password, so there is version control, of a sort.

You might be interested in:

git config --global core.excludesfile ~/.gitignore

You can have a system-wide (but local only) .gitignore. It doesn't help other people who clone your repo, but it can be useful in some situations.

No need to change the config; the default global ignore file path is ~/.config/git/ignore
Ah, thank you.
What i do is not put secrets in files in my source tree. If secrets have to go in files, they go in files somewhere well clear of any source control tool.
I built an e2e encrypted cloud service for secrets in case you’re interested in trying it: https://cloudenv.com
You might like git-crypt then, to add actual version control for your secrets
I do something similar, by having a pretty global exclude for folders called donotbackup in my backup tools. Quite useful.
As far as backup is concerned, a well-supported (by Borg, restic, and others) way of excluding directories is by putting a file that conforms to the CACHEDIR.TAG standard.

https://bford.info/cachedir/

Thanks! I wasn't aware of this. Mostly it's for Apple's Time Machine which doesn't support it, but this is neat to hear about.
I like this idea a lot and I am slightly annoyed that I didn’t think of it myself. Thanks for the contribution.
If people are looking for a way to put encrypted files into git, you can use LockGit https://github.com/jswidler/lockgit.
I agree. It would have been interesting to do something like secret-token:example.com/abcdef with the option of secret-token:example.com/auth/abcdef (where auth is an arbitrary token type picked by example.com)
Well, you can create your tokens with the structure "domain/authtype/code". I think that's a good idea, and is plainly allowed by the standard.

Yeah, it would be better if it was standardized too, but I didn't think about all the corner cases. Maybe it can't be standardized.

> I think that's a good idea, and is plainly allowed by the standard.

Watch out, "secret-token:domain/authtype/code" is not a valid secret-token by the standard!

The standard has grammar:

  [RFC8959]
  secret-token-URI    = secret-token-scheme ":" token
  secret-token-scheme = "secret-token"
  token               = 1*pchar

  [RFC3986]
  pchar               = unreserved / pct-encoded / sub-delims / ":" / "@"
  pct-encoded         = "%" HEXDIG HEXDIG
  unreserved          = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims          = "!" / "$" / "&" / "'" / "(" / ")"
                        / "*" / "+" / "," / ";" / "="
That means "/" cannot be part of a secret-token, and a strictly standard-compliant scanner will not pattern-match on "secret-token:domain/authtype/code".

I think that may be a design mistake as people will inevitably build tokens with those characters (using the same reasoning you did), and they won't show up in some scanners (any that are strictly compliant). "/" is allowed in query strings despite being a path delimiter before the query string; allowing it in secret-token would make sense too.

Fortunately it's not a security problem as long as "/"-delimited paths in tokens don't start with "/", because the preceding characters will be enough to match anyway. However, if you have a scanner where you whitelist some strings after being shown matches, the fact it doesn't match the security part of the token introduces a risk of mistakenly whitelisting too broadly (just the domain in this example), and of course there's a chance someone may use a path starting with "/" without realising this is a problem.

Ouch. I didn't check the grammar on the other RFC this one pointed. I just assumed it was sane (even more because the inline text explaining it makes no mention of the slash).

I really didn't expect somebody to define a URL as having a single URL part. Now my opinion is that this RFC is ill conceived.

You could add a bunch of %2F to stand in for slashes, but that's pretty clunky.
Is it? Spec https://www.rfc-editor.org/rfc/rfc3986.txt 2.2 states "/" is reserved and should be encoded vs "-" which is unreserved.

Thinking further: I don't want a secret to identify what it's for. That increases odds it's used when leaked accidentally.

The very real problem is anyone who checks a secret into source should be held responsible for any data loss including fines and jail time. Poor security practices are that are obvious avoidable are inexcusable.
You think that one errant command line entry should subject an individual to personal liability that could run into the billions of dollars?

First, control inputs that can cause catastrophic damage require purpose built system-wide multi-layered controls to prevent from being accidentally applied.

Second, it is never the responsibility of a single individual (even the CEO) to ensure this engineering requirement is identified, scoped, budgeted, funded, fulfilled, and regularly tested.

Third, absent actual malice — specific intent to cause damage — the personal liability for a simple mistake caused by a single person should always be dramatically reduced relative to the damages, particularly when the accident was only possible due to lack of proper safety engineering, or an actual cascade of failures.

Fourth, if software developers are somehow supposed to shoulder personal civil liability for potentially billions of dollars of damages due to a single mis-typed command, the simple truth is that nobody would knowingly and willingly accept that job.

> You think that one errant command line entry should subject an individual to personal liability that could run into the billions of dollars?

Setting aside the hyperbolic dollar amount you’ve suggested (in North America, stories I’ve found about specific engineers being fined for structure collapses have been in the five-figure range[0][1][2])… sure, why not?

If a civil engineer accidentally writes down one load calculation incorrectly, doesn’t follow well-known safe design practices which would’ve caught the error, and it causes the structure to collapse, they do have personal liability. Why should software engineers have special immunity?

This industry has a terrible track record of self-policing when it comes to security, so maybe some added liability would help—and, to the OP’s point, there really is no way that a secret token should be finding its way into a public code repository except by failing to follow safe design practices.

[0] https://www.ehstoday.com/archive/article/21914808/engineers-...

[1] https://www.lexology.com/library/detail.aspx?g=3812f88d-8670...

[2] https://www.denenapoints.com/engineer-fined-errors-dallas-co...

> If a civil engineer accidentally writes down one load calculation incorrectly, doesn’t follow well-known safe design practices which would’ve caught the error, and it causes the structure to collapse, they do have personal liability. Why should software engineers have special immunity?

The government regulates civil engineering, because when civil engineers screw up, people die.

The same is true for certain categories of software – aviation, motor vehicles, medical devices, etc. You can argue about whether the regulations in those areas are good enough (incidents like the 737 MAX suggest not), but those flaws are arguably best addressed by improvements in those industry-specific regulatory processes rather than trying to regulate software engineering as a whole.

With your run-of-the-mill blog hosting software or SaaS app, software bugs don't kill people. Privacy violations, financial damages yes, but actual deaths no. Anyone who wants to propose increased government regulation of software engineering – to turn software engineering into a regulated/licensed profession like civil engineering or medicine, with the kind of personal liability attached that those professions have – is going to get a lot of pushback from businesses that it is adding expense without any great benefit. And I think it is going to be hard to find anywhere enough political capital to overcome that pushback. Financial damage is insurable, and voters value their life and health far more than their privacy.

I don't remember being taught "never commit secrets into source control" at university. I'm sure that calculating loads is part of the civil engineering syllabus at most universities. Civil engineers are also required to be qualified and registered professionals, unlike software engineers.

There's a myriad array of ways that data breaches can occur. Which ones am I liable for as a software developer? If I'm on the receiving end of a 0day exploit, am I still liable? Or if I'm targeted by the Russian/Chinese/North Korean/{{currentEnemy}} government?

To update an old teaching for modern times: He that is without security lapses among you, let him first impose sanction on those who have.

Or to put it more bluntly - all users have flaws, and if your system relies on users not having flaws for being secure, it isn't.