| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by midnitewarrior 1107 days ago
	What if the credentials got checked into GitHub, and GitHub Copilot is auto completing those credentials in random coding sessions? Woof

2 comments

SparkyMcUnicorn 1107 days ago

I have an MIT licensed GitHub repo (created in 2019) that I purposefully left keys in and deactivated them before I even committed.

The repo is somewhat niche, and copilot will nearly (with some help) create the entire repo, including the original repos comments.... but won't generate the same keys no matter how hard I've tried.

I'm pretty sure there was some at least some sanitization before it made its way into the model.

link

hackernewds 1107 days ago

your anecdata is not data though. it's entirely possible to check in keys in variables that do not resemble keys to any supposed AI

link

throwawayadvsec 1107 days ago

extremely unlikely

LLMs tokens are usually common word or parts of word, and it would be extremely weird for copilot to output them verbatim in generated code(I've actually tried a few times), or it would be random invalid keys since there is no real patterns in API keys

+I'd be shocked if they weren't automatically stripped from the training data

link

russell_h 1107 days ago

I’m not sure how it’s implemented, but when CoPilot suggests code with an inline API key or similar it seems to reliably generate a sequential alphanumeric sequence that is discernible at a glance from real data.

I’m sure there are edge cases, but I’ve been surprised how well it handles this.

link