Hacker News new | ask | show | jobs
by woodruffw 1217 days ago
I understand that this is meant to be an eye-popping press release (and implicitly a product spotlight), but some of these claims make me gag.

It's not an attack "on" PyPI, or even an attack at all: someone is just spamming the index with packages. There's no evidence that these packages are being downloaded by anyone at all, or that the person in question has made any serious effort to conceal their attentions (it's all stuffed in the setup script without any obfuscation, as the post says). The executable in question isn't even served through PyPI (for reasons that are unclear to me): it's downloaded by the dropper script. Ironically, serving the binary directly would probably raise fewer red flags.

Supply chain security is important; we should reserve phrases like "aggressive attack" for things that aren't script kiddie spam.

3 comments

The most "aggressive" part is that those sweet package names like "colorslib" are being stolen.
My biggest curiosity here is how they generated over a thousand package names ranging from feasible to interesting. I expected gibberish.

Lol, maybe, "chatgpt, give me a thousand feasible pypi package names"?

The names seem to be simple concatenations of random parts like "game", "lib", "vm", "cv", "http".

They do look surprisingly convincing.

Thankfully, they're not actually being stolen because all the packages were already taken down; they're available for legitimate use again: https://pypi.org/project/colorslib/
While I think that _may_ be the right thing to do here... it's a bit worrying as recycling names like that has it's own share of risks.
I think it's a serious threat, especially with LLMs now because people can make believable packages at scale. Not everyone vets their packages thoroughly
Speaking of LLMs. Since LLMs like to hallucinate every now and then, an LLM could also hallucinate names of packages that it tells people to install. And those packages could in turn have been squatted by malware authors.

And in this way, malicious packages may be unintentionally downloaded by users even when those malicious packages did not yet exist when the LLM was trained. Just because the hallucinated package name was randomly later taken by someone malicious.

I've seen this effect get amplified also when somebody puts a "bad" answer in a public place like StackOverflow. It is possible to have quite a large blast radius from something like this!
An attacker could also try to get a list of packages that the LLMs halucinate, and squat on those.
You've always been able to make "believable" packages at scale. PyPI doesn't enforce uniqueness: you can crank out malicious near-duplicates of any package you please.
And, to parent's point, now LLMs will tell people to use them and they will[1].

[1] https://news.ycombinator.com/item?id=34916682

Stack Overflow and Google search results were already doing that though, at massive scale. I agree it changes things somehow, but people not thinking before acting is not a new problem.
I agree that it is a threat. I don't think this instance is (it's too noisy).

I wrote a comment on the NPM thread earlier (https://news.ycombinator.com/threads?id=freeqaz) that I'll quote here:

> "While being flooded with spam is never good, it gets immediately noticed and mitigated. It's harder for open source projects to spot and stop rare one-offs"

This is the real problem that NPM and other ecosystems face. A determined attacker that is trying to "poison" a popular Open Source package just has to feign as a maintainer long enough to succeed[0]. Defeating these types of attacks will require rethinking how we think about trust of packages.

Projects like Deno are one approach (fork the ecosystem) while projects like Packj (mentioned elsewhere here), Socket.dev, and LunaTrace[1] are taking the other angle (make it harder to install malware).

It's hard to say which approach is better right away. (Probably a hybrid of both, realistically) It's just non-trivial to fix this in one clean swoop. It's messy.

0: https://www.trendmicro.com/vinfo/us/security/news/cybercrime...

1: https://github.com/lunasec-io/lunasec

Me, I just use the stdlib and my local packages.

There's something beautiful in knowing you're using pure, clean Python. Much easier to install, also.

No. This is very concerning.

Attacking a popular repository like this does not have to have a high hit rate.

"Script kiddie spam" is now computers get compromised. Unsophisticated mass attack.

This sport of thing, combined with woeful security and fragile systems are causing havoc the world over.