Hacker News new | ask | show | jobs
by mofeing 1363 days ago
hey,

We were running with the same problem (supercomputer with clusters of different architecture and no outgoing connections permitted) and so we created "pypickup" [1,2]. nice to see that we came with similar solutions! I have some questions:

1. is the directory of packages you create compatible with the PEP 503? (so I can use `--index-url file://PATH_TO_LOCAL_CACHE` flat with pip and it should work)

2. is there some filtering mechanism? e.g. we are not interested in non-release versions ("dev" versions, "rc" versions, "post" versions, ...)

3. I guess that the way morgan resolves dependencies is by manually parsing files like "pyproject.toml" or "requirements.txt" and it does not ask the build-system for the dependencies. if so...

   - does "morgan" detect build-dependencies?

   - which build-systems are compatible?

   - is "morgan" capable of detecting more complex dependency specifications? e.g. "oldest-supported-numpy" which is used by "spicy" has dependency strings like the following: numpy==1.19.2; python_version=='3.8' and platform_machine=='aarch64' and platform_python_implementation != 'PyPy'
kudos for the good work

[1] https://pypi.org/project/pypickup/ [2] https://github.com/UB-Quantic/pypickup

1 comments

Too bad your project didn't come up in any of my searches while researching this problem. Probably because it doesn't use the word "mirror" at all :)

As for your questions:

1. I don't see any mention of directory structures in PEP 503. The Morgan server does implement PEP 503 though. In any case, I tried installing now straight from the directory and it didn't work. Are you sure you meant PEP 503?

2. Where Morgan differs from pypickup, as I can see, is that it interprets requirement strings as per PEP 508 (e.g. "requests>=2.40.0; python_version < '3.8'") instead of providing a command such as `pypickup add requests`. For every requirement string, it looks for the latest version in PyPI that satisfies it, and downloads that version. You can filter _in_ the requirement strings, other than that Morgan doesn't have any specific handling of dev/rc/etc.

3. Morgan detects and downloads the build system based either on the [build-system] section of pyproject.toml, or the setup_requires.txt file (from setuptools). These are the sources currently supported. It doesn't actually care what the build system is, it simply attempts to find where it is defined and download it as well.

As for complex dependency specifications, yes, they are supported and honored (Morgan relies on the "packaging" library to properly evaluate those). By the way, I recently moved from Poetry to Hatch for managing the Morgan project itself specifically because I got fed up with Poetry not honoring those specifications, and trying to download completely irrelevant packages.

Well, we first named it "pypi-cache" but there is a package named "pypicache" from the year 2007 and we had to rename it. We always thought of it as a "cache" rather than a "mirror"... but yes, "mirror" is more appropriate. Btw we released it just 1 week ago which is also maybe why you did not find it.

1. Well, the flag "--index-url" explicitly says that "... should point to a repository compliant with PEP 503 (the simple repository API) or a local directory laid out in the same format". PEP 503 defines the directory structure where there is a folder per package, an "index.html" on the root with a link to each package and *an "index.html" in each package folder that has a link per available file*.

URLs are not limited to "https", they can also be relative paths. So the trick we do is to download the file to the folder of the package and add an anchor to that file in the "index.html" of the package. For example,

If you go to https://pypi.org/simple/numpy, you will find links like the following: <a href="https://files.pythonhosted.org/packages/f6/d8/ab692a75f584d1..." data-requires-python=">=3.8">numpy-1.22.4.zip</a>

But we download it and write, <a href="./numpy-1.22.4.zip" data-requires-python=">=3.8">numpy-1.22.4.zip</a>

This is specially important for us because we cannot setup any kind of server.

2. Okay nice. Yep, we thought that parsing would be more difficult and that relying on parsing would be problematic due to the different build-systems and that many packages still do not have the "pyproject.toml" file. We opted for a manual approach in which you do "pypickup add" until you have no more "dependency missing" errors. Your approach looks much better to me, but like you said is limited to "pyproject.toml" and "setuptools" right now.

Btw, does it also downloads extra dependencies?

3. Nice. I also stopped using Poetry for things like that, but now I manually write my "pyproject.toml" with "setuptools".

I like the idea on trying to parse the dependencies. I will probably try something but since we download all files (filtering some of them), it would be more costly. Maybe in some weeks when I'm more free.

Ahh, I get it, it needs index.html files. I can easily implement this, but I actually did want the server because I wanted it to be easily accessible from multiple machines, I also wanted to implement the JSON API, and also want (in an upcoming version) to allow uploading private packages to the mirror.

As for extra dependencies, yes, they will be mirrored, but only if relevant, i.e. if they are included in a requirement string (be it a direct requirement or a dependency of a dependency).

Ahh ok. In our case all the machine have a shared network filesystem where we store the mirror.

Great about the extras.

Would you mind if we reference each other in the readmes?

Yeah sure, no problem.