Hacker News new | ask | show | jobs
Show HN: Sniffgit – A Python lib to find sensitive files and information in a repo (github.com)
3 points by LHardi 3088 days ago
2 comments

Nice start. I notice this only scans the HEAD of the repository. Have you considered implementing functionality to go back through previous commits and check for secrets in files there? After all, once something is committed to git, even if you change the file, the old version is still there (by design, obviously).

For a more complex implementation of a solution to this problem, checkout trufflehog [0], which "searches through git repositories for high entropy strings and secrets, digging deep into commit history."

[0] https://github.com/dxa4481/truffleHog

Hi there, a feature to scan previous commits sounds awesome and I'll start working on it soon!

truffleHog also provides a sophisticated approach in detecting potential secret strings.

Thank you for the feedback! :)

Hi there, I built this library after reading up some InfoSec SE posts about what sensitive files (and information) that should be gitignored or not included at all in a git repo.

The following article was also a motivation for me to start the project, “Dev put AWS keys on Github. Then BAD THINGS happened”: https://www.theregister.co.uk/2015/01/06/dev_blunder_shows_g...

How this library works: sniffgit starts from the root of your git working directory, and check if there are any sensitive files (id_rsa, *.cert, etc) that are exposed, i.e. files that haven't been gitignored or files that shouldn’t be in a repo at all.

This library also checks textfiles for sensitive information, such as AWS_SECRET_ACCESS_KEY, email, password, etc. Some files and directories are not going to be read at all, though (e.g. binary file, .git, yarn.lock).

Currently, the “sensitive info / line analysis” will have a lot of false positive result for larger projects. The reason is that it only checks for keyword such as “password, API_KEY, email, etc” for each line in a text file.

This is my first ever open-source project. Feedbacks are truly appreciated, particularly about OSS best practices :).

Interesting project! Perhaps you could add a return value depending on whether results were found (using sys.exit or something like that) so it can be integrated in CI-pipelines.
Thank you for the suggestion! I will add that feature today. I believe that the project will be more useful if it can be easily integrated into CI pipelines!