I think it's a serious threat, especially with LLMs now because people can make believable packages at scale. Not everyone vets their packages thoroughly
Speaking of LLMs. Since LLMs like to hallucinate every now and then, an LLM could also hallucinate names of packages that it tells people to install. And those packages could in turn have been squatted by malware authors.
And in this way, malicious packages may be unintentionally downloaded by users even when those malicious packages did not yet exist when the LLM was trained. Just because the hallucinated package name was randomly later taken by someone malicious.
I've seen this effect get amplified also when somebody puts a "bad" answer in a public place like StackOverflow. It is possible to have quite a large blast radius from something like this!
You've always been able to make "believable" packages at scale. PyPI doesn't enforce uniqueness: you can crank out malicious near-duplicates of any package you please.
Stack Overflow and Google search results were already doing that though, at massive scale. I agree it changes things somehow, but people not thinking before acting is not a new problem.
> "While being flooded with spam is never good, it gets immediately noticed and mitigated. It's harder for open source projects to spot and stop rare one-offs"
This is the real problem that NPM and other ecosystems face. A determined attacker that is trying to "poison" a popular Open Source package just has to feign as a maintainer long enough to succeed[0].
Defeating these types of attacks will require rethinking how we think about trust of packages.
Projects like Deno are one approach (fork the ecosystem) while projects like Packj (mentioned elsewhere here), Socket.dev, and LunaTrace[1] are taking the other angle (make it harder to install malware).
It's hard to say which approach is better right away. (Probably a hybrid of both, realistically) It's just non-trivial to fix this in one clean swoop. It's messy.
And in this way, malicious packages may be unintentionally downloaded by users even when those malicious packages did not yet exist when the LLM was trained. Just because the hallucinated package name was randomly later taken by someone malicious.