Hacker News new | ask | show | jobs
I found 10k GitHub repositories distributing Trojan malware (orchidfiles.com)
211 points by theorchid 4 hours ago
17 comments

I have to say, the principle that open-source software can't do anything nefarious because the source is open just hasn't held up for a lot of reasons -- including that nobody has the time to inspect the code, let alone ensure that it matches the binaries; and also that GitHub has become a distribution hub for software used by lots of people with no ability or interest in auditing the software they use.
The choice is between code you can validate and code you can't, not code that has malware and code that doesn't.
> the principle that open-source software can't do anything nefarious because the source is open just hasn't held up for a lot of reasons

You've been living on such a principle? That sounds insane, why would something not be nefarious just because you can read the code?

The way I was "raised" by FOSS greybeards screaming at me through web forums, was that any software available on 3rd party websites anyone can upload anything to, will be filled with viruses and malware, and this was early 2000s. Surely people still advocate for this mindset today, when it's even more likely?

> You've been living on such a principle? That sounds insane

Fun fact, I've spent the last few days fretting over whether to add H2 to my FabricMC mod. The problem being that I don't know what class-loading shenanigans could possibly occur if I jar-in-jar include it: what happens if another mod has H2 jar-in-jar included? Will my mod only reference its own version of H2? What implications [if any] would that have? Or will the Fabric Loader pick one? What if another mod has H2 shaded instead? Will the classes clash differently? What if, instead of jar-in-jar including it, I shade and relocate it? Does H2 or JDBC rely on reflection or services that would render it non-functional?

All recommendations point to using/creating a mod specifically for that library and depending on it. As luck would have it, one already exists on Modrinth. Except... I'm then requiring anyone who trusts my mod to also install this other mod that I have no control over. I just looked at the source code and it looks fine, but that's if you trust that the published jars are the exact result of that source code: maybe there's something malicious in the Gradle Wrapper binary. This mod could at any time become malicious and how would I detect that?

Guess what? I asked around and was summarily told to stop worrying, that it's fine. We on this website need to realise that we're a minority: NO ONE is routinely (or even occasionally) scrutinising the source code of the stuff they install from third-party websites. I have never, not once, seen anyone hash a downloaded file to check that it matches what's on the website. At the very most, I've seen people find the Github repo, see that it has a lot of stars, and then assume it's safe.

No, I've not been "living on" such a principle but it was a big claim for "the bazaar."
Aha, wasn't that argument more about that closed source software is more likely to hide stuff you don't agree with, than FOSS? Not necessarily that FOSS won't have any viruses or malware, but it's at least less likely. That was my take away, but long time ago I read the book admittedly, I might misremember or transformed it automagically over time.
This is my takeaway as well. Having the source code open makes it auditable, if not by you, maybe the community.

The free software license specifically gives the software an extra advantage in that changes to the software must be shared openly, if distributed as as binaries.

> source code open makes it auditable, if not by you, maybe the community

I think part of why this social engineering works so well is it takes advantage of that "many eyes" trust, where people are prone to delegating the responsibility of checking to the community and not do due diligence on themselves. I know I'm susceptible to it if I see a Github repo with more than 10k stars on it.

You'd better read it again, because that claim does not figure in that text. You might mean that with more eyes on the code, more bugs are found, than with no eyes on the code. But that is not what you are saying here.
> You've been living on such a principle?

I have not, but in case you missed it, this principle has been used by open source proponents for decades. I'm an open source developer myself, but always found it odd.

No, it's really not, and really hasn't been. Do people truly have such poor reasoning and logic skills?

"Closed source software is inscrutable, impossible for me to fix, impossible for me to review the source" is absolutely a distinct statement from "it is impossible to hide malware in open-source software". I've literally never heard someone claim the latter.

(edit for coherency, thanks graemep)

> "it is impossible to hide malware in open-source software"

No nobody said "exactly that". But many times I've seen people claiming to trust open source as it is safer and people can check and build themselves. Seen it too many times. But reality is different than what is claimed.

I think you mean open source in the second bit in quotes.
This is not the argument at all. It's just easier to discover malware in closed software.
The problem the article is describing seems to have little to do with open source. There were GitHub repositories that had links added in their READMEs to a zip file containing compiled binaries.

GitHub is not a curated software repository. It's essentially no different from some random stranger linking to some binaries on a forum. (There are communities that seem to have no concerns about running unknown binaries from strangers in forum threads, but I wouldn't recommend it.)

What's opensource about this?

  - Application.cmd or Launcher.cmd
  - loader.exe or luajit.exe or another_name.exe
  - random_name.cso or random_name.txt
  - lua51.dll
All of the content are binaries or launcher scripts.
Ironically, one of the promises of AI: enough eyeballs.

The catch is the eyeballs can also be used to generate exploits.

I think that this is becoming increasingly true only for large, well-known repositories, where the maintainers have a lot to lose by doing anything shady. I don't think the React team could get away with doing something like that, for example.
Not true. If statistics offer a “measure” of reality, my guess is that “OS doing nefarious things” must fall between 0,005% and 0,007%. In any case compared to the extracted value it’s … nothing.
If all projects on github were closed source with public "trust me bro" binaries the situation would be of course much better.
"Trust me bro" is what people say about open source everywhere when it's not true.
I uploaded a sample found here (https://github.com/alexct142010-cell/McBackuper ) to Genus Codes (need an account): https://genuscodes.com/results/7ad4b911d05a12f91ab27ba3baa35... Seems to be related to the disco trojan family, by way of normalized function matching at 50% to malicious file https://genuscodes.com/results/eddbc29db4677e00c1a901aadbadb... and a normalized 50% match to https://genuscodes.com/results/fdb6cff68a2a8c08779d64a7cf61d...

Virustotal link: https://www.virustotal.com/gui/file/fdb6cff68a2a8c08779d64a7...

> I typed the project name into Google, and my repository appeared in the results. I entered the same query into Bing, and someone else’s repository appeared in the results

Side story, this kind of thing is what made me stop using Bing.

I had been using it as the default for searches (it sucks, but it's at least not Google), until I landed on a phishing page for my bank (I haven't committed it to memory yet). The page was a near perfect copy, and I would easily have gotten pwnd by it if they didn't have a modal asking me to run some code in my terminal for "security activation" that made me go "that's a little odd... Is this the right address OH SHIT that's a .ru domain"

I never see Google return phishing pages or typo squatters in the first page. Bing constantly returns that stuff in the first several results.

This is where password managers are useful because they would refuse to fill in login information since the domain doesn't match
I use keepass (FOSS under GPL, fully offline).

It does not detect domains.

KeepassXC browser integration will do that.
"Dang, this site isn't working right with the password manager's detection. Guess I just gotta paste the password in again..."

Meanwhile U2F/Passkeys can't possibly be abused like this.

Yeah but the downsides of passkeys make them so much worse anyway.
Pretty happy with having a yubikey on my keychain. Log in someplace new? plonk in your yubikey and off you go!
I used to keep a yubikey in a spare slot on my laptop. One day it fell out and subsequently escaped through an unnoticed hole in my backpack.

I've never lost a password because my backpack was overly abused.

And when your keychain gets lost then what?
Exactly. All these ideals work in theory but then in reality banks are also incompetent and will use all kinds of domains.

Same with meta and Google where they often direct you to domains that aren't under their main one and it's actually legit, but there's no way to know. It's impossible to teach family members to pay attention if it's really that domain because it's often legit not that domain.

    at least not Google
Is one giant mega-corp better than any other?

You're going to have a hard time convincing me the answer is yes.

Why would you go to your bank by first searching for it? Sounds very insecure to me. I type my banks url directly instead, or if that gets tedious, store it as a bookmark.

I know several people who search for important sites, click uncritically on links, and get scammed. This is not so good.

speaking only to search quality: try Kagi.
> Why do they delete a commit and push a new one every few hours?

May be to make it appear on the top of the "Last Updated" repositories in case someone searches for the repo or a keyword. So instead of the author's actual repo, the users endup cloning the trojan infected one.

Bingo!
A year ago a similar attack was reported and I think that there have been similar campaigns reported this year: https://github.com/evilsocket/opensnitch/discussions/1290#di...

  - This is a new repository, not a fork
  - All repositories have different contributors and different names
  From the last two points, it becomes clear that even if we find one such repository, we won’t be able to find other similar repositories using it.
In previous campaigns the repositories were linked to a few users. But those users had starred other users, that at the same time had also cloned other repositories with the malware. Sometimes the malicious repository had been cloned from another malicious repo, and if you listed the repositories and "friends" of that user, all were part of the botnet.

Also, github doesn't delete repositories and accounts, they mark them as deleted. If you use their api you can still list them.

It happened a few times to me that I'd find some very well constructed scam scheme (cryptocurrency washing systems, web platform/phishing scams), then I'd research deeper into it to see how it worked, just to ultimately feel powerless not knowing what to do with the information.
This is what a community is for!

No individual person can be the superhero that saves the day on everyone's behalf. But what we can do is provide what little help or insight that we have, and then pass the issue along to others.

Perhaps all it means is that you end up doing what OP did: the "deeper" research that you mentioned plus a little post on Hacker News or elsewhere.

Even if nothing comes of it in the end, at least you'll have tried.

>The zip archive contains 4 files: Application.cmd or Launcher.cmd loader.exe or luajit.exe or another_name.exe random_name.cso or random_name.txt lua51.dll If you submit a link to the archive to VirusTotal, it will find 0 viruses. If you submit the zip file itself, it will detect a Trojan inside it.

MS Windows

Any open source tool to scan a github repo before download/install it locally? I'm thinking of semgrep or socket.dev but I wonder if there's a better option
People need to do their due diligence when including open-source software and packages not just when they first use them but anytime you have a need to upgrade them. I highly doubt I'm the first one to think of this, but there really aught to be tool or comprehensive set of tools that routinely scan open-source software and packages for potentially malicious code and alert users of the problem(s).
There are. Socket, Aikido, and a number of others do this all the time.
Step-Security, Wiz ..
is it possible to ban them or report them ?
I uploaded several of these virus-infected archives to VirusTotal. In each archive, under the “Network Communication” section, the virus makes requests to three resources: a GET request to a website to retrieve IP information, a POST request to a Polygon RPC node (drpc), and a POST request to what appears to be the virus creator’s server. I can only assume that the scheme is designed to steal cryptocurrency.
> Another month later, GitHub support sent me an email saying that they had removed these repositories.

I recently discovered a campaign where somebody was forking very small but useful codebases, and replacing the distributable with some malware, and making the repository have better SEO with changes to the README. My case was a simple macOS application that could be used to control some Phillips LED light strip.

I reported it to GitHub and it was removed within 24 hours.

I discovered another repository like this, and they still haven't replied since (one month).

No clue how their malware reports work. I'm surprised they don't partner with some antivirus company to at least scan "releases" for malware (not repositories themselves)

> I'm surprised they don't partner with some antivirus company to at least scan "releases" for malware

...like Windows Defender? Oh, the irony :D

It will feel very spooky when they stop updating because of this essay .
damn 10k ? thats a lot, how did you get them ?
Hmm. Using a script. That's explained in the article)
are there any ci/cd that controls them?
Microsoft: and the one thing we absolutely refuse to use AI for is to flag this kind of bullshit to protect users, because it would violate the rule of "don't do anything actually useful with it".
Hi Claude fable, why u not protecting me from malware? Am i not american enough? Not rich enough? Yieks..