Hacker News new | ask | show | jobs
by jasode 1275 days ago
>, I had no idea they had a program called "secret scanning" and that it's actually beneficial.

Fyi... this feature was also previously mentioned in the news for public repos: https://techcrunch.com/2022/12/15/github-brings-free-secret-...

>So I obviously assumed they're letting China scan my private repos.

To clarify, it's Microsoft/Github doing the scanning of private repos on behalf of the partners. They're just forwarding the tokens that match the partners' regexp.

3 comments

Yeah I read the article and the comments on HN so I know what it's about now. I still think they (not HN) should change the title to include what secret scanning means.

Edit: how about dropping the corporatese and title it "github will now scan public repos for secret WeChat tokens"?

Yea, but you don't get panic clickthru with this message.
hmm it's a pity github blog didn't have any advertisement...
Actually there's an even better term below in the thread. Use just "for WeChat credentials". No mention of secret scanning.
This is 100x better than the original!
Like .* ?
>Like .* ?

Assuming your question is not a joke...

The partner has to email the regex to secret-scanning@github.com for their approval. See the steps at: https://docs.github.com/en/developers/overview/secret-scanni...

Once it's in the scanning system, the partner receives JSON messages alerts such as:

  [
    {
      "token":"NMIfyYncKcRALEXAMPLE",
      "type":"mycompany_api_token",
      "url":"https://github.com/octocat/Hello-World/blob/12345600b9cbe38a219f39a9941c9319b600c002/foo/bar.txt",
      "source":"content"
    } 
  ]
So instead of ""token":"NMIfyYncKcRALEXAMPLE"," -- the private repo owners would worry about '.*' regex leaking full source code instead of API credentials such as ""token":"#include <stdio.h>\nmain(){\nprintf("hello world");\n}","

The above scenario requires believing the following:

- Microsoft/Github is technically incompetent and an employee and/or their internal regex sanity checking tool will blindly accept open-ended regex like '.*'

- MS/Github will then allow that unbounded regex to leak petabytes of private source code out to China partners via the JSON "token:" response. (Github says they have 18+ petabytes of data and most of that is private repos: https://twitter.com/github/status/1569852682239623173)

If one believes their entire private repo source code is at risk of being copied to TenCent being leaked by the '.*' threat because the above scenario seems realistic, I assume the answer is to delete the repo.

https://docs.github.com/en/code-security/secret-scanning/abo... is pretty damn clear that secret scanning for private repos only alert owners; only the public repo scans alert partners (for instant revocation).
I don’t think GitHub will send back the matching string, just the name of the repos
That seems bad? Look for /winnie/i in all private repos. The repo name includes the owner. Then go and arrest them.
Devils advocate: I read recently that GitHub is being used to circumvent censorship in China. Does this system of allowing them to provide regexes allow China to automatically obtain lists of users who are mentioning certain words or phrases? Or is that nonsense?
> Or is that nonsense?

Yes, that is nonsense.

1) secret scanning can be disabled (not even sure it's enabled by default). 2) the regexes are fairly specific, length limited, etc. 3) github is obviously reviewing regexes that are accepted.

Check the list of stuff supported: https://docs.github.com/en/code-security/secret-scanning/sec...

A bit sad, they don't publish the list of regexes, etc.

--------------

I added a similar thing to the package manager for Dart / Flutter, because we saw users accidentally publishing secrets. That code is public, it relies on regexes and entropy estimation:

https://github.com/dart-lang/pub/blob/eb8ee21a089ebe0f2c2dd8...

It was heavily inspired by the researchers in: https://www.ndss-symposium.org/wp-content/uploads/2019/02/nd...

Worth a read, and certainly provides motivation for Github to do this kind of work :D

(disclosure: I work for Google. The opinions stated here are my own)

Once again[1][2], scanning alerts on private repos are only sent to owners. Whereas public repos are, you know, public.

It's really tiring that people correct other people's misinformation when they themselves haven't read the bold bullets points in "Learn more about secret scanning"[3] and end up totally missing the point.

[1] https://news.ycombinator.com/item?id=34067335

[2] https://news.ycombinator.com/item?id=34067625

[3] https://docs.github.com/en/code-security/secret-scanning/abo...

Ah yeah, that's a good point.

Honestly, I'm just very happy GitHub is doing this, because we've all made these mistakes. And it's so easy for then to hide in git revision history. Only the be found when someone scans for the secrets.

I had the same reaction. This seems like the plan of scanning of pictures on iPhones for CSAM; it would not be hard to add extra patterns that match materials beyond the original intent.

Are the secret patterns all publicly available? Or is the secret scanning patterns themselves secret? Without public review, we cannot know what secrets they will obtain.

I for one do not trust GutHub/Microsoft to act in the interest of the average user. Their past actions disqualify them from receiving any benefit of doubt.