Hacker News new | ask | show | jobs
by tamimio 745 days ago
Lesson for the people who run and execute stuff without looking at the code first.
3 comments

Which is everybody in the world except for a handful of people.
Most people should only download software from people they trust (to not be evil and also to be competent).

If you download code off some unknown person's GitHub repo, you'd be stupid not to read it very very carefully!

Not really, and it takes a few minutes because most of these packages (including npm) are small. You don’t have to read the WireGuard codebase because it’s reputable enough, but for obscure or unknown add-ons/package code, it’s on you to double-check, just like reading the ‘readme’.
So just sneak the code in a dependency of a dependency.

Who’s diving 3-4 layers deep into dependencies?

No need to hide it inside dependencies, just modify the code before building and pushing the package to PyPi.
You can't "not really" this away. Most people don't bother looking at small package code, much less code for packages that are far more complex.
I haven’t looked at the source code of a single npm package I’ve installed in the past 5 years.

“It takes a few minutes”

Dude my web dev projects have like 1,000s of dependencies. I’m not going to check the source code of every package tailwind requires.

Even if you did review it, a motivated attacker is not going to have an exfiltrate_user_data(). The xz backdoor exploit was incredibly sophisticated, and one key of the design was sneaking a "." into a single line of a build test script.

A cursory audit of primary dependencies has almost zero chance of catching anything but a brazen exploit.

Yeah. Realistically I think the best course of action is just assume you’re already using a library that can exfiltrate data.

This requires allowlisting egress traffic and possibly even architecting things to prevent any one library from seeing too many things. This approach can be a big pain though and could be difficult to implement practically.

Imo this makes no sense. There's zero chance you will start inspecting all dependencies even in a relatively small application, which now a days could pull already a large number of deps.

I don't see how doing any of this manually will help.

This is why I refuse to use almost anything on npm. If you have a zero dependency project I'll consider it. If you have a dependency that also has a set of dependencies then I will never use your code.
Would you have caught the XZ backdoor?
Everyone runs code they have not inspected. For example, almost no one has read all of the code of in FreeBSD, Linux (kernel), MacOS, Open BSD, or Windows. I also doubt people are reading all of the code in their favorite Linux distribution.

Even inspecting the code is not enough because a lot of security vulnerabilities are not obvious. Basically, security is hard, and often there are not a lot of good solutions.

Here are some tricks I have found which have helped me minimize my risk:

1) Use different machines for different purposes. Basically, you should not use 1 PC (or Mac) for everything. I have one for my finances, one for gaming, and a general-purpose PC. If one gets hacked, the others are still fine.

2) Get software from trustworthy sources. Most of the major software companies are not going to ship malicious code. For open-source software, use software from popular projects which have a good reputation.

3) Ask yourself why is someone providing this software? Is it for money? Are they creating it because they enjoy it? How do they support themselves? For example, Google's business model is building a dossier on people so it can deliver ads they are more likely to click on. When Google gives you something for "free", they will probably use it to track you, or track visitors to your website.

4) Support the people who build the software you use. If its commercial software, pay for it, do not pirate it. If it's open source, donate time or money to the projects you use. Also, thank the people who work on the software, and ALWAYS treat them with respect.

5) Avoid pirated software, software from "free" porn web sites, etc. People who provide illegal software, or sketchy software are probably willing to put back doors in it.

> For open-source software, use software from popular projects which have a good reputation.

On this topic, how much should a person trust central repositories of well-known operating system distributions (e.g. Arch, Debian)? I know only trusted people can upload to them, and the only time I've ever heard of malware slipping past them was XZ, but I don't know how much care they take.

Ain't nobody got time for that. LLMs should be capable of analysing code for anything malicious / suspicious.
Unfortunately, no, because the existence of LLMs that can automatically determine code that is suspicious will be offset by the existence of LLMs that can generate malicious code that bypasses the detection abilities of the aforementioned LLMs.
Generative Adversarial LLMs, let’s go!
Perhaps we could just call these ALLMs (Adversarial Large Language Models). You’re already dropping the N in GAN, I see no need for the G.

As an end result I think someone clever could make a LLaMA pun for the name of a LLaMA based ALLM.

No, they cannot work with large code base, not yet. And have very limited talent for logic and debugging. They may improve at some point, probably will be hooked up with external tools.
Since LLM and keyloggers are turing machines, it won't happen. (Or more precisely: it won't beat the cat and mouse game of obfuscations.)
You're hired!