Hacker News new | ask | show | jobs
by chiefalchemist 3415 days ago
Computers are supposed to be good at what imperfect humans are not. This only proves how primitive the tool is.

That is, for example, if Gmail can ask "it looks like you forgot the attachment" why can't Git say "this is a public repo and you're about to commit and push passwords. Are you sure?"

It's going to be easier to fix the tool than it is to make humans be perfect.

6 comments

Gmail can make a simple keyword search for a handfull of phrases in something that's known to be text

Git would have to first decide whether a file is a textfile or binary file, a decision that can be done reasonably well heuristically but that is undecidable in the general case. Then it has to parse text files for a long, curated list of known keywords that are only used for storing API keys and are not (usually) used in normal code. I'm not sure if that's even feasable.

And then of course git has no concept of "public" and "private" repos, so the entire task can't be handled well by git.

Exactly! So we agree that Git - as it is today - is not up to the task(s)? :)

That's all I'm trying to say

We keep expecting a cat to bark, and then we're shocked and disappointed that it doesn't. So let's stop asking and find / build a better tool.

At least parsing and checking the commit message wouldn't be too hard, right?
I don't see a need for Git or Github to do this. Would be simple enough to set up your own Git precommit hook, that can be shared if you like.
While there may well be common trends, Git is a tool for arbitrary content - it's going to be pretty hard to accurately find passwords/secrets being committed. There are tools out there for more specific sets of stuff, but expecting git to catch anything is a little much.
Regardless. The broader point is, Git is a screwdriver and what is needed at this point is a hammer. Sure, we can keep trying to pound nails with a screwdriver but that's harder work and is far less productive.

We. Need. A. New. Tool.

p.s. But there are how many CSS pre-processors? And how many JS frameworks? Etc. Things we don't need. Go figure.

I mean, I'd argue it's a screwdriver and you want an electric screwdriver. Sure, I applaud that effort, but it doesn't mean the original thing is bad, just that it could be improved.
How would git know that it's a password/key/whatever?
I believe Django's logging framework will automatically replace strings in your settings.py file (basically a dict) with '*' if the key "looks like" a secret (contains the word 'secret' or 'key' or 'passw' etc).
What would I do? I'd take ALL those incidents on GitHub and I'd run them thru some sort of AI pattern recognition algorithm. That would become my identification "engine" (?).

It might not catch everything all the time - since humans are pretty creative when it comes to fucking things up - but I bet it would be pretty effective. Certainly more effective than what we have now. Then if it can keep learning going forward, all the better, eh.

just make it search for files or variable assignments named "password" or "secret". That will catch the majority.

In comparison Gmail doesn't catch all cases either, if you say something like "here are" instead of I've attached it misses it.

Key is easy. The high entropy should tip you off.

Passwords, look for variables with the name password, passwd assigned strings.

Like Gmails attachment, it'll get stuff wrong, just make it easy to continue on.

This. However, it would only work with secure passwords. Setting the entropy count too low would result in a bunch of false positives.
I'd say that it's hard to implement this effectively. Maybe as a language / framework-specific hook.
Good idea! Shouldn't be so hard to implement a simple prototype.

if "PASSWORD=xxx" in text => prompt alert or ask confirm

Next step would be to take this list (search result), make a curated list of 100-1000 unencrypted password (text/line + files + infos of repo), and then hard code some rules to detect +80% of cases.