Hacker News new | ask | show | jobs
by epage 172 days ago
> uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.

Isn't assigning out what all made things fast presumptive without benchmarks? Yes, I imagine a lot is gained by the work of those PEPs. I'm more questioning how much weight is put on dropping of compatibility compared to the other items. There is also no coverage for decisions influenced by language choice which likely influences "Optimizations that don’t need Rust".

This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.

2 comments

> Isn't assigning out what all made things fast presumptive without benchmarks?

We also have the benchmark of "pip now vs. pip years ago". That has to be controlled for pip version and Python version, but the former hasn't seen a lot of changes that are relevant for most cases, as far as I can tell.

> This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.

This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML. A package installer really only needs to look at it at all when building from source, and part of what these standards have done for us is improve wheel coverage. (Other relevant PEPs here include 600 and its predecessors.) Although that has also largely been driven by education within the community, things like e.g. https://blog.ganssle.io/articles/2021/10/setup-py-deprecated... and https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... .

> This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML.

I don't know the details of Python's resolution algorithm, but for Cargo (which is where epage is coming from) a lockfile (which is encoded in TOML) can be somewhat large-ish, maybe pushing 100 kilobytes (to the point where I'm curious if epage has benchmarked to see if lockfile parsing is noticeable in the flamegraph).

But once you have a lock file there is no resolution needed, is there? It lists all needed libs and their versions. Given how toml is written, I imagine you can read it incrementally - once a lib section is parsed, you can download it in parallel, even if you didn't parse the whole file yet.

(not sure how uv does it, just guessing what can be done)

For Cargo,

- synchronization operations are implicit so we need to re-resolve to confirm the lockfile is still valid. We could take some short cut but it would require re-implementing some logic

- dependency resolution only uses `Cargo.toml` for local and git dependencies. Registry dependencies have a json summary of what content is relevant for dependency resolution. Cargo parses nearly every locked package's `Cargo.toml` to know how to build it.

For whatever it's worth, the toml library uv uses doesn't support streaming parsing: https://github.com/toml-rs/toml/issues/326
I'm not sure if it even makes sense for a TOML file to be "read incrementally", because of the weird feature of TOML (inherited from INI conventions) that allow tables to be defined in a piecemeal, out-of-order fashion. Here's an example that the TOML spec calls "valid, but discouraged":

    [fruit.apple]
    [animal]
    [fruit.orange]
So the only way to know that you have all the keys in a given table is to literally read the entire file. This is one of those unfortunate things in TOML that I would honestly ignore if I were writing my own TOML parser, even if it meant I wasn't "compliant".
I don't think that's worse than having to search an arbitrary distance for a matching closing bracket. There are tasks where you can start working knowing that a given array in the data might be appended to later (similarly for objects).
TOML as a format doesn't make sense for streaming

- Tables can be in any order, independent of heirarchy

- keys can be dotted, creating subtables in any order

On top of that, most use cases for the format are not benefitted by streaming.

Lockfiles aren't an issue. It is all the dependencies themselves.
To be fair, the whole post isn't very good IMO, regardless of ChatGPT involvement, and it's weird how some people seem to treat it like some kind of revelation.

I mean, of course it wasn't specifically Rust that made it fast, it's really a banal statement: you need only very moderate serious programming experience to know, that rewriting legacy system from scratch can make it faster even if you rewrite it in a "slower" language. There have been C++ systems that became faster when rewritten in Python, for god's sake. That's what makes system a "legacy" system: it does a ton of things and nobody really knows what it does anymore.

But when listing things that made uv faster it really mentions some silly things, among others. Like, it doesn't parse pip.conf. Right, sure, the secret of uv's speed lies in not-parsing other package manager's config files. Great.

So all in all, yes, no doubt that hundreds of little things contributed into making uv faster, but listing a few dozens of them (surely a non-exhaustive lists) doesn't really enable you to make any conclusions about the relative importance of different improvements whatsoever. I suppose the mentioned talk[0] (even though it's more than a year old now) would serve as a better technical report.

[0] https://www.youtube.com/watch?v=gSKTfG1GXYQ

That's true. Probably not compiling and using HTTP Range made a big difference, but we don't know, do we? Quantifying these differences, rather than just listing them, would make the post much more valuable.