| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toomuchtodo 2141 days ago
	Or store the containers in the Internet Archive alongside the paper. They’re just tarballs. Lots of options as long as you're comfortable with object storage.

2 comments

brutos 2141 days ago

This still means that tools published in the last few years until now might just be gone soon. The people who uploaded the images might have graduated or moved on and none will be there to save the work.

link

icebraining 2141 days ago

Sounds like a job for the Archive Team, as long as there's some way to identify the images worth saving.

link

maxfan8 2141 days ago

Yep, just mentioned it to the Archive Team IRC. We're probably going to selectively archive particular Docker images, although that's a lot of manual labor.

If you have any ideas wrt to selecting important images, that'd be great.

link

thebouv 2141 days ago

Rough idea: maintain an Awesome List of images worth saving, take submissions from public, use that list to automate what to pull?

link

maxfan8 2141 days ago

Yeah, good idea — I’m not in these fields so it’s difficult for me to judge. Also, it sounds like we should be prioritizing niche images that only a handful of papers use rather than images that people rely upon regularly.

link

cosmie 2141 days ago

Couldn't you bootstrap a list by searching/parsing the Archive dataset itself? Searching for

A) "docker pull" commands and parsing the text that comes after it based on the command's syntax[1] to extract instructional references to images such as "docker pull ubuntu:latest, and

B) Searching for links/text beginning with "https://hub.docker.com/_/" to identify informational references to image base pages such as (https://hub.docker.com/_/ubuntu)

[1] https://docs.docker.com/engine/reference/commandline/pull/

link

contravariant 2141 days ago

Since images tend to be based on each other I wonder if someone's analyzed the corresponding dependency graph yet. In theory you should get quite far if you isolate the most commonly used base images.

link

CameronNemo 2141 days ago

Are those not the images that are basically guaranteed to stay in Dockerhub?

link

toomuchtodo 2141 days ago

“Guaranteed” is a strong word.

link

captn3m0 2141 days ago

quay is another alternative.

link