Hacker News new | ask | show | jobs
by vaughandroid 1491 days ago
You've obviously thought about this quite a bit... Do you have any ideas as to how projects could avoid this problem?
1 comments

I have, but to be honest I've forgotten a lot of the specific debates and their finer points.

I don't have a single, definitive, clear solution -- as pointed out by others -- nobody does. It's not a simple problem.

That doesn't mean that steps can't be taken to improve the situation, perhaps dramatically in some cases.

1) Enforced MFA to publish a crate -- credential theft is semi-regularly seen as an attack vector.

2) Strong links between the "source ref" and the specific crate versions. An example of this done super badly is NuGet. All of the hundreds (thousands?) of Microsoft ASP.NET packages point to the same top-level asp.net or .net framework URLs. E.g.:

https://www.nuget.org/packages/Microsoft.Extensions.Configur...

Links to "https://dot.net" as the Project Website, and "https://github.com/dotnet/runtime" as the repository. This couldn't be more useless. Where is the Git hash for the specific change that "7.0.0-preview.4.22229.4" of this library represents? Who knows...

3) Namespaces. They're literally just folders. If you can't code this, don't run a huge public website. This is more important than it sounds, because wildly unrelated codebases might have very similar names, and it's all too easy to accidentally drag in entire "ecosystems" of packages. Think of the Apache Project. It's fine and all if you've "bought in" to the Apache way of doing... everything. But imagine accidentally importing some Google thing, some Netflix thing, some Apache thing, and some Microsoft thing into the same project. Now your 2 KLOC EXE is 574 megabytes and requires 'cc', 'python', and 'pwsh' to build. Awesome.

For example, in ASP.NET projects I avoid anything not officially published by Microsoft and with at least 10M downloads because otherwise it's guaranteed to be a disaster in 5-10 years. Ecosystems diverge, wildly, and no single programmer or even a small group could possibly stitch them back together again. Either it's a dead end of no further upgrades, or rip & replace an entire stack of deeply integrated things.

4) Publisher-verified crate metadata / tags. You just cannot rely on the authors to be honest. It's not even about attacks, it's also about consistency and quality. All crates should be compiled by the hosting provider in isolated docker containers or VMs using a special "instrumented build" flag. Every transitive dependency should be indexed. Platform compatibility should be verified. Toolchain version compatibility should be established for the both the min and max range. Flags like "no-std" or whatever should be automatically checked. CPU and platform compatibility would also be very helpful for a lot of users. The most important one in the Rust world would be the "No unsafe code" tag.

This would stop "soft attacks" such as the guy spamming C++ libraries as Rust crates. Every such crate should have been automatically labelled as: "Requires CC" and "Less than 10% Rust code".

Similarly, if a crate/package changes its public signature in a breaking way, then the publishing system should enforce the right type of semantic versioning bump.

Essentially, what I would like to see is something more akin to a monorepo, but not technically a single repository. That is, a bunch of independent developers doing their own thing, but with a cloud-hosted central set of tooling that helps gain the same benefits as a monorepo.

I'm expecting a lot of arguments along the lines of "that sounds like a lot of work, etc..." Meanwhile Mozilla had a large team for this, millions of dollars of funding, and did not do even 0.1% of what Matt Godbolt did in his spare time...

Good answer. Here is one more, and I got to say I am really unsure why it isn't done this way: Check each update to a crate for the usage of a TCP / HTTP / UDP stack usage. There is absolutely no reason a crate for math (for example) should be introducing any of that in its code. If you catch something like this, you can be 99% sure it's malware.

Or even better, make crates request permissions for what kind of functions they can call, similar to the chrome plugin API. A graph crate doesn't need encryption, file opening or netstack permissions.

Transitive crate "permissions" would be amazing. To know at a glance if a crate does networking, filesystem access, IO, etc.

Someone could always roll their own IO, but self reporting and automated detection tooling (to catch those that slip through the cracks) would bring this percentage way down.

Maybe the language could even evolve "unsafe" for IO, even if just as a flag for users. That way it would all be incredibly easy to audit.

Packj tool (https://github.com/ossillate-inc/packj) analyzes Python/NPM packages for risky code and attributes such as Network/File permissions expired email domains. It uses static code analysis. We are adding support for Rust. We found a bunch of malicious packages on PyPI using the tool, which have now been taken down: examples https://packj.dev/malware [disclosure: I’m one of the developers]
One of my thoughts (see my profile) is "library mesh". You could register your code (or the parts that integrate) as a reverse dependency on what you depend on and they should build your code as part of their build to see if anything breaks.

Refactoring breaks everything - just look at Python 2 to Python 3. Part of the problem is having a bill of materials and having accurate tracing of the ingredients of a build, such as reproducible builds. But the time investment in these is hard work.

Computers are good at cross referencing. If you indexed everything in a graph database. It should be a simple graph Search to find dependencies of a git commit hash, binary sha256 installer file. And a web of sha256 relationships. And mapping between them and commit shas. It would be useful for security too but also interested in things being robust and reliable. Like you say it's a monorepository but not one.

Tooling is what's needed

The python packaging experience and ruby bundler and npm and other version manager experiences simply leads to common breakage.

> Similarly, if a crate/package changes its public signature in a breaking way, then the publishing system should enforce the right type of semantic versioning bump.

A common following-up question the Haskell people used to say when that was being discussed there is, if your build system can do that, why do you need semantic versioning for?

Still, it's better than doing nothing. There's nothing similar to Stackage or Backpack on Rust, so it would be a clear gain. It's just that you can go further.

I work with a lot of .NET Framework legacy code. Those nuget packages have even more useless links, I’ve had both official packages 404ing (to be fair, making their own links 404 is a specialty of Microsoft, for documentation they have the opposite approach as for windows, nothing should be backwards compatible -.-) but more often link to some modern .NET core (or .NET 6 now I guess) and thus be completely unrelated.