Hacker News new | ask | show | jobs
by danShumway 2500 days ago
> is a software package hosting service, similar to npmjs.org, rubygems.org, or hub.docker.com, that allows you to host your packages and code in one place. You can host software packages privately or publicly and use them as dependencies in your projects.

I am... really confused by this.

Isn't this just Github? Github is a hosting service that allows you to host your packages and code in one place. It has testing and publishing pipeline support, you can add artifacts/releases, make your packages private or public, host different types of software at the same time, and it's compatible with most existing dependency systems, including NodeJS.

I can see this has more download statistics, which is nice. And it has a policy that artifacts can't be deleted, which is very nice.

Is that it though? I know I have to be missing something; what can I do now that I couldn't already do with Github as is?

6 comments

They are implementing the APIs that the package managers expect to fetch artifacts, rather than the package managers having to know how to fetch files via Git.
This appears to be a move to meet package managers halfway. Yes, tools like npm and go have great integration with git repositories, but others like NuGet still require a hosting source. Long-term, you could imagine package managers forgoing their own hosting services in favor of letting GitHub be the primary host who takes on the issues of bandwidth, availability, access controls, etc. It's another vector for GitHub to compete in FOSS.
It’s also good for githubs economics to be able offload data from git itself to easily mirrored versioned tarballs. It’s much more cost effective than mirroring constantly changing master branches or worse whole git repos for clones.
If you are familiar with Nexus or Artifactory or Verdaccio, which all essentially let you have private NPM repos (among other formats like Maven, etc.), that's what this is.
What's the difference between publishing a binary file on Artifactory and linking to a binary file in a Github release[0]?

Is Artifactory immutable? Or I guess that it handles versioning/publishing better?

[0]: https://help.github.com/en/articles/linking-to-releases

Each dependency management tool has their own nuances about how artifacts should be uploaded, and retrieved and what metadata should be stored along side them.
Artifactory can also proxy stuff. It's widely used as a corporate proxy for public repos so that you don't rely on the whims of the internets.
In my experience, using git as a dependency source for NPM (including yarn) or Ruby never worked well. It works for a simple case, but it's usually much slower, has issues around managing credentials for private repos, and doesn't have a nice way to publish built files.
Also, most importantly, Git repositories are not immutable and any package repo that's not immutable is a terrible, terrible idea
This is one reason I created hashcache [0][1], for referencing remote immutable resources that can be addressed by their cryptographic hash. I used this in my Linux distribution to download source tarballs for every package by their SHA2-256.

[0] https://chiselapp.com/user/rkeene/repository/hashcache/ [1] http://hashcache.rkeene.org/

git repos can be as immutable as you want. you just need to point your package manager to a commit or tag, instead of a branch head. if you are worried about a rebase, well you have that issue with any public artefact stores.
The point is that it's not your Git repo, usually, when talking of dependencies, so it's not really about what you want.

SHAs can't be changed, but they can be deleted. And on GitHub, entire projects, usernames, orgs can be deleted. Or renamed. In case of a user rename, GitHub does maintain redirects for awhile. Until that username is taken by somebody else.

If that is a big concern you can fork. If you are building production systems with dependencies on eggs you can't find in pypi you probably should take control of those in your own copies. I can't recall once that I had to do that for things that I ask money for though... if its not in pypi its probably not worth using. And if it is useful, forking or just copying the module or package into your own code base takes care of any shifting dependencies.

So yea, does not seem to be a problem that actually exists.

> If that is a big concern you can fork

Surely you must be joking.

Yes it is a big concern and the solution is to use repositories that aren't so volatile.

Copyright law prevents package repos from being truly immutable.

Fortunately a copyright takedown request is not a typical scenario, but it does happen, even with "immutable" repositories like maven-central.

Once a piece of software is released as open source, it can be freely distributed. And Maven Central packages require an open source license. The author might own the copyright, but he licensed that copyright away when publishing on Maven Central.

In other words a "copyright takedown request" isn't valid, unless the author was in violation of the copyright of somebody else while publishing those packages and this was decided in a court of law.

It might happen, but I have never heard of Maven Central packages being removed.

But I do see GitHub repos being renamed or removed all the time and I have seen NPM packages removed, for no reason other than the author wanted so, screwing the entire JavaScript ecosystem.

> In other words a "copyright takedown request" isn't valid, unless the author was in violation of the copyright of somebody else while publishing those packages and this was decided in a court of law.

The DMCA process is law. Maven Central (like anyone else who hosts things) have to respond to valid takedown requests (which means taking down content long before any court case; even if a counter-notice is filed the content still has to be taken down temporarily) or else they'd become liable for infringement themselves. It's less common than on github or NPM, sure (which I suspect has more to do with the complexity of maven central's registration process than anything else), but it happens and any host on the scale of maven central needs a process in place for doing it.

Even bad_user's assertion that "Once a piece of software is [legitimately] released as open source, it can be freely distributed" is not 100% true. There is a mechanism in US copyright law through which copyright holders and their heirs can unilaterally retract copyright grants and licenses 56 years after the initial grant or license.

Granted this is quite the esoteric edge case... at least for now. ;-)

At least for Ruby, it was easy to pin a specific commit SHA in the Gemfile to guarantee immutability.
That does not guarantee it exists at the source repo though. You can’t create different content at that same hash, but you can rewrite history or delete the branch of that hash entirely and it will eventually be GCed away.
> what can I do now that I couldn't already do with Github as is?

Before this new service how would you use GitHub as a source for installing, for example, Maven packages?

There's https://jitpack.io, but it's a third-party service. Having first-party support for Maven artifacts served by GitHub will be quite nice.
I guess I'm not sure how Maven works then -- I thought it was just downloading package binaries? I would use Github releases for that and link to the binary directly. I'd use a CI to auto-build and publish a new release binary whenever I pushed to master.

Does Maven do something more complicated like automatically figure out which platform binary to pull?

It's the file layout, for just one thing. You can't just point Maven (or many package managers) at a simple HTTP server without the correct layout.

If you could... they wouldn't have built this.

Can't you? http://repo.maven.apache.org/maven2/ looks a lot to me like a simple HTTP server.

I'll take your word on it though. I don't know much about how Java package management works, and like you said, I assume the Github team wouldn't waste their time building something that wasn't necessary.

I guess if nothing else it would be a pain in the neck to have to know in advance how release files had to be laid out.

> ...without the correct layout

How are you going to recreate that directory structure with GitHub releases? You can't even have any custom directories - they're just release-name/file-name.

I mean just try recreating it yourself and see how far you get.

You could try use GitHub pages instead, but GitHub very actively pester you if it even looks like you're distributing binaries there.

> Before this new service how would you use GitHub as a source for installing, for example, Maven packages?

http://www.lordofthejars.com/2011/09/questa-di-marinella-e-l...

It may seem less useful for interpreted languages, like JS, where the code is the artifact. For compiled languages, having a separate place to store official artifacts is much more important.
Compiled and interpreted are not mutually exclusive either. In particular in the JS world it’s very common to transpile from modern JS or TypeScript to lowest common denominator JS.