Hacker News new | ask | show | jobs
by nottorp 1275 days ago
Brilliant title for the article.

Even though I'm a paid github customer, I had no idea they had a program called "secret scanning" and that it's actually beneficial.

So I obviously assumed they're letting China scan my private repos.

They really need to work on wording.

8 comments

>, I had no idea they had a program called "secret scanning" and that it's actually beneficial.

Fyi... this feature was also previously mentioned in the news for public repos: https://techcrunch.com/2022/12/15/github-brings-free-secret-...

>So I obviously assumed they're letting China scan my private repos.

To clarify, it's Microsoft/Github doing the scanning of private repos on behalf of the partners. They're just forwarding the tokens that match the partners' regexp.

Yeah I read the article and the comments on HN so I know what it's about now. I still think they (not HN) should change the title to include what secret scanning means.

Edit: how about dropping the corporatese and title it "github will now scan public repos for secret WeChat tokens"?

Yea, but you don't get panic clickthru with this message.
hmm it's a pity github blog didn't have any advertisement...
Actually there's an even better term below in the thread. Use just "for WeChat credentials". No mention of secret scanning.
This is 100x better than the original!
Like .* ?
>Like .* ?

Assuming your question is not a joke...

The partner has to email the regex to secret-scanning@github.com for their approval. See the steps at: https://docs.github.com/en/developers/overview/secret-scanni...

Once it's in the scanning system, the partner receives JSON messages alerts such as:

  [
    {
      "token":"NMIfyYncKcRALEXAMPLE",
      "type":"mycompany_api_token",
      "url":"https://github.com/octocat/Hello-World/blob/12345600b9cbe38a219f39a9941c9319b600c002/foo/bar.txt",
      "source":"content"
    } 
  ]
So instead of ""token":"NMIfyYncKcRALEXAMPLE"," -- the private repo owners would worry about '.*' regex leaking full source code instead of API credentials such as ""token":"#include <stdio.h>\nmain(){\nprintf("hello world");\n}","

The above scenario requires believing the following:

- Microsoft/Github is technically incompetent and an employee and/or their internal regex sanity checking tool will blindly accept open-ended regex like '.*'

- MS/Github will then allow that unbounded regex to leak petabytes of private source code out to China partners via the JSON "token:" response. (Github says they have 18+ petabytes of data and most of that is private repos: https://twitter.com/github/status/1569852682239623173)

If one believes their entire private repo source code is at risk of being copied to TenCent being leaked by the '.*' threat because the above scenario seems realistic, I assume the answer is to delete the repo.

https://docs.github.com/en/code-security/secret-scanning/abo... is pretty damn clear that secret scanning for private repos only alert owners; only the public repo scans alert partners (for instant revocation).
I don’t think GitHub will send back the matching string, just the name of the repos
That seems bad? Look for /winnie/i in all private repos. The repo name includes the owner. Then go and arrest them.
Devils advocate: I read recently that GitHub is being used to circumvent censorship in China. Does this system of allowing them to provide regexes allow China to automatically obtain lists of users who are mentioning certain words or phrases? Or is that nonsense?
> Or is that nonsense?

Yes, that is nonsense.

1) secret scanning can be disabled (not even sure it's enabled by default). 2) the regexes are fairly specific, length limited, etc. 3) github is obviously reviewing regexes that are accepted.

Check the list of stuff supported: https://docs.github.com/en/code-security/secret-scanning/sec...

A bit sad, they don't publish the list of regexes, etc.

--------------

I added a similar thing to the package manager for Dart / Flutter, because we saw users accidentally publishing secrets. That code is public, it relies on regexes and entropy estimation:

https://github.com/dart-lang/pub/blob/eb8ee21a089ebe0f2c2dd8...

It was heavily inspired by the researchers in: https://www.ndss-symposium.org/wp-content/uploads/2019/02/nd...

Worth a read, and certainly provides motivation for Github to do this kind of work :D

(disclosure: I work for Google. The opinions stated here are my own)

Once again[1][2], scanning alerts on private repos are only sent to owners. Whereas public repos are, you know, public.

It's really tiring that people correct other people's misinformation when they themselves haven't read the bold bullets points in "Learn more about secret scanning"[3] and end up totally missing the point.

[1] https://news.ycombinator.com/item?id=34067335

[2] https://news.ycombinator.com/item?id=34067625

[3] https://docs.github.com/en/code-security/secret-scanning/abo...

Ah yeah, that's a good point.

Honestly, I'm just very happy GitHub is doing this, because we've all made these mistakes. And it's so easy for then to hide in git revision history. Only the be found when someone scans for the secrets.

I had the same reaction. This seems like the plan of scanning of pictures on iPhones for CSAM; it would not be hard to add extra patterns that match materials beyond the original intent.

Are the secret patterns all publicly available? Or is the secret scanning patterns themselves secret? Without public review, we cannot know what secrets they will obtain.

I for one do not trust GutHub/Microsoft to act in the interest of the average user. Their past actions disqualify them from receiving any benefit of doubt.

Yeah, you have to read the article to realize that "secret" is a noun in this case, not an adjective...
Seems unbelievable how they'd fumble the title when they could just as easily have called it "secrets scanning" and it would be OK.
Or maybe “secret detection”

Though perhaps that’s just my own bias on the subtle differences in the meanings of those words.

Maybe it's intentional, to generate traffic.
I think the name of the service is a bit ambiguous; they could've called it "Access Key Scanning" or even just "Secret*s* Scanning". Even capitalizing it would set it apart as a service instead of regular words in a sentence.
Credential scanning.

It's not scanning that they're doing in secret. Credential scanning removes the ambiguity

Scanning repos for secrets has been a thing for a while now. But seeing Tencent might put people on edge.
Secret scanning is a thing.

But this is an excellent next step where they build an integration with these partners where, as soon as a secret is scanned, they can notify tencent/AWS/other providers automatically to instantly invalidate those keys before they’re abused.

That’s what’s novel here.

I wasn’t commenting about whether it was novel or standard practice. OP seemed to have gotten the heebie-jeebies from the submission title.
China doesn’t even have to scan now, Github is going to sort it out and send it all for us. Sounds bad.
There's never been a better time to migrate your projects away from corporate control
Actually the best time expired several years ago. Also prevention is better than cure.
Not not true, but if one migrates their repos now, they will create new projects on their new home which will start to break the cycle.
Hmm, 2022 (2023 soon!) and people are jumping and screaming on their first and incorrect reaction to some headline (not you, but scroll down for a comment doing just that). God Bless The Internet! /s