I have to say, the principle that open-source software can't do anything nefarious because the source is open just hasn't held up for a lot of reasons -- including that nobody has the time to inspect the code, let alone ensure that it matches the binaries; and also that GitHub has become a distribution hub for software used by lots of people with no ability or interest in auditing the software they use.
> the principle that open-source software can't do anything nefarious because the source is open just hasn't held up for a lot of reasons
You've been living on such a principle? That sounds insane, why would something not be nefarious just because you can read the code?
The way I was "raised" by FOSS greybeards screaming at me through web forums, was that any software available on 3rd party websites anyone can upload anything to, will be filled with viruses and malware, and this was early 2000s. Surely people still advocate for this mindset today, when it's even more likely?
> You've been living on such a principle? That sounds insane
Fun fact, I've spent the last few days fretting over whether to add H2 to my FabricMC mod. The problem being that I don't know what class-loading shenanigans could possibly occur if I jar-in-jar include it: what happens if another mod has H2 jar-in-jar included? Will my mod only reference its own version of H2? What implications [if any] would that have? Or will the Fabric Loader pick one? What if another mod has H2 shaded instead? Will the classes clash differently? What if, instead of jar-in-jar including it, I shade and relocate it? Does H2 or JDBC rely on reflection or services that would render it non-functional?
All recommendations point to using/creating a mod specifically for that library and depending on it. As luck would have it, one already exists on Modrinth. Except... I'm then requiring anyone who trusts my mod to also install this other mod that I have no control over. I just looked at the source code and it looks fine, but that's if you trust that the published jars are the exact result of that source code: maybe there's something malicious in the Gradle Wrapper binary. This mod could at any time become malicious and how would I detect that?
Guess what? I asked around and was summarily told to stop worrying, that it's fine. We on this website need to realise that we're a minority: NO ONE is routinely (or even occasionally) scrutinising the source code of the stuff they install from third-party websites. I have never, not once, seen anyone hash a downloaded file to check that it matches what's on the website. At the very most, I've seen people find the Github repo, see that it has a lot of stars, and then assume it's safe.
Aha, wasn't that argument more about that closed source software is more likely to hide stuff you don't agree with, than FOSS? Not necessarily that FOSS won't have any viruses or malware, but it's at least less likely. That was my take away, but long time ago I read the book admittedly, I might misremember or transformed it automagically over time.
This is my takeaway as well. Having the source code open makes it auditable, if not by you, maybe the community.
The free software license specifically gives the software an extra advantage in that changes to the software must be shared openly, if distributed as as binaries.
> source code open makes it auditable, if not by you, maybe the community
I think part of why this social engineering works so well is it takes advantage of that "many eyes" trust, where people are prone to delegating the responsibility of checking to the community and not do due diligence on themselves. I know I'm susceptible to it if I see a Github repo with more than 10k stars on it.
You'd better read it again, because that claim does not figure in that text. You might mean that with more eyes on the code, more bugs are found, than with no eyes on the code. But that is not what you are saying here.
I have not, but in case you missed it, this principle has been used by open source proponents for decades. I'm an open source developer myself, but always found it odd.
No, it's really not, and really hasn't been. Do people truly have such poor reasoning and logic skills?
"Closed source software is inscrutable, impossible for me to fix, impossible for me to review the source" is absolutely a distinct statement from "it is impossible to hide malware in open-source software". I've literally never heard someone claim the latter.
> "it is impossible to hide malware in open-source software"
No nobody said "exactly that". But many times I've seen people claiming to trust open source as it is safer and people can check and build themselves. Seen it too many times. But reality is different than what is claimed.
The problem the article is describing seems to have little to do with open source. There were GitHub repositories that had links added in their READMEs to a zip file containing compiled binaries.
GitHub is not a curated software repository. It's essentially no different from some random stranger linking to some binaries on a forum. (There are communities that seem to have no concerns about running unknown binaries from strangers in forum threads, but I wouldn't recommend it.)
I think that this is becoming increasingly true only for large, well-known repositories, where the maintainers have a lot to lose by doing anything shady. I don't think the React team could get away with doing something like that, for example.
Not true. If statistics offer a “measure” of reality, my guess is that “OS doing nefarious things” must fall between 0,005% and 0,007%. In any case compared to the extracted value it’s … nothing.
> I typed the project name into Google, and my repository appeared in the results. I entered the same query into Bing, and someone else’s repository appeared in the results
Side story, this kind of thing is what made me stop using Bing.
I had been using it as the default for searches (it sucks, but it's at least not Google), until I landed on a phishing page for my bank (I haven't committed it to memory yet). The page was a near perfect copy, and I would easily have gotten pwnd by it if they didn't have a modal asking me to run some code in my terminal for "security activation" that made me go "that's a little odd... Is this the right address OH SHIT that's a .ru domain"
I never see Google return phishing pages or typo squatters in the first page. Bing constantly returns that stuff in the first several results.
Exactly. All these ideals work in theory but then in reality banks are also incompetent and will use all kinds of domains.
Same with meta and Google where they often direct you to domains that aren't under their main one and it's actually legit, but there's no way to know. It's impossible to teach family members to pay attention if it's really that domain because it's often legit not that domain.
Why would you go to your bank by first searching for it? Sounds very insecure to me. I type my banks url directly instead, or if that gets tedious, store it as a bookmark.
I know several people who search for important sites, click uncritically on links, and get scammed. This is not so good.
> Why do they delete a commit and push a new one every few hours?
May be to make it appear on the top of the "Last Updated" repositories in case someone searches for the repo or a keyword. So instead of the author's actual repo, the users endup cloning the trojan infected one.
- This is a new repository, not a fork
- All repositories have different contributors and different names
From the last two points, it becomes clear that even if we find one such repository, we won’t be able to find other similar repositories using it.
In previous campaigns the repositories were linked to a few users. But those users had starred other users, that at the same time had also cloned other repositories with the malware. Sometimes the malicious repository had been cloned from another malicious repo, and if you listed the repositories and "friends" of that user, all were part of the botnet.
Also, github doesn't delete repositories and accounts, they mark them as deleted. If you use their api you can still list them.
It happened a few times to me that I'd find some very well constructed scam scheme (cryptocurrency washing systems, web platform/phishing scams), then I'd research deeper into it to see how it worked, just to ultimately feel powerless not knowing what to do with the information.
No individual person can be the superhero that saves the day on everyone's behalf. But what we can do is provide what little help or insight that we have, and then pass the issue along to others.
Perhaps all it means is that you end up doing what OP did: the "deeper" research that you mentioned plus a little post on Hacker News or elsewhere.
Even if nothing comes of it in the end, at least you'll have tried.
>The zip archive contains 4 files: Application.cmd or Launcher.cmd loader.exe or luajit.exe or another_name.exe random_name.cso or random_name.txt
lua51.dll If you submit a link to the archive to VirusTotal, it will find 0 viruses. If you submit the zip file itself, it will detect a Trojan inside it.
Any open source tool to scan a github repo before download/install it locally? I'm thinking of semgrep or socket.dev but I wonder if there's a better option
People need to do their due diligence when including open-source software and packages not just when they first use them but anytime you have a need to upgrade them. I highly doubt I'm the first one to think of this, but there really aught to be tool or comprehensive set of tools that routinely scan open-source software and packages for potentially malicious code and alert users of the problem(s).
I uploaded several of these virus-infected archives to VirusTotal. In each archive, under the “Network Communication” section, the virus makes requests to three resources: a GET request to a website to retrieve IP information, a POST request to a Polygon RPC node (drpc), and a POST request to what appears to be the virus creator’s server. I can only assume that the scheme is designed to steal cryptocurrency.
> Another month later, GitHub support sent me an email saying that they had removed these repositories.
I recently discovered a campaign where somebody was forking very small but useful codebases, and replacing the distributable with some malware, and making the repository have better SEO with changes to the README. My case was a simple macOS application that could be used to control some Phillips LED light strip.
I reported it to GitHub and it was removed within 24 hours.
I discovered another repository like this, and they still haven't replied since (one month).
No clue how their malware reports work. I'm surprised they don't partner with some antivirus company to at least scan "releases" for malware (not repositories themselves)
Microsoft: and the one thing we absolutely refuse to use AI for is to flag this kind of bullshit to protect users, because it would violate the rule of "don't do anything actually useful with it".