Hacker News new | ask | show | jobs
by nightshift1 760 days ago
Interesting read. Thanks for sharing. Maybe it is by lack of experience, but I always treat tarballs like a loaded gun. I extract then inside an empty subdirectory in my home first just to be sure and then move the data as required. It is no fun having to cleanup the mess left by an incorrect extraction.
6 comments

This is a good and safe practice. And I think most people do it this way after cleaning up the mess of a badly built tarball at least once :)
I'd argue this is a design bug. Extracting into the current directory should either not be possible or be the exception (tar xvfz --current-directory) and not singling out tar here, unzip, pkunzip, etc all have this issue and have all cause people data loss and worse because if this default unsafe behavior
The fact that any program will overwrite files by default is also terrible.
Unless there is only one folder in the archive, and it's not overwriting anything, then it should be extracted into the current directory so you don't get nested dupes
Lots of things looks like a design bug today but were a prudent choice for the environment at the time.

Like when I have an electrician at me house. Says the old way was dumb... but that was fully up to code in 1954.

Time makes fools of us all.

That might be true but I've using arc, tar, pkzip since the 80s and even then I lost work and had to clean up on floppy disks because of this issue. I suppose the prudent thing is to list the files before decompressing
Sure (and me as well) but, design bug? Maybe just different expectations of the user.

On that point, some heavy machinery took years before adding safety guards (some even required legislation before improving safety)

Even `rm -rf *` warns you before just doing it.
I have 30 years of professional experience and with one off tarballs that I'm not deeply familiar with, this is usually what I do as well (certainly with a tarball that has a /usr like structure inside of it). You're good
Most distros should have a useful Linux package "atool", containing the command "aunpack" which does the least surprising thing for all archive types (without creating duplicate root when it already exists).
I believe the unar tool creates a containing directory by default.
Extracting archives directly into your system root as a superuser is in the same class of activity as piping curl output into your shell interpreter as a superuser: things that no one should ever do.
> I extract then inside an empty subdirectory in my home first

... AFTER a "tvf[jzx]", I hope

What's the gain of -t if the extraction target is disposable?

Does -x have some side effect that -t would list for you?

It's nice to check if you're about to extract a 1MB tarball into 2TBs of data before actually running out of disk space.

Most tar programs do prevent extracting tarballs containing absolute paths (like '/etc/passwd') and relative paths (like '../../../etc/passwd'), but older tar programs still allow that. And programs written in Go, because of course: https://github.com/golang/go/issues/55356

Overall, if your HDD size is infinite and you're using GNU tar, or another recent tar, you can skip 't' I think before doing a '-C' extraction into some safe directory.

How do you create a tar archive that "contains absolute paths"?
The tar file format doesn't prevent you from specifying absolute paths in the archive. It's up to the tool extracting the archive to reject/ignore such paths.
I asked about options for GNU tar because there is a bit of strange behavior.

To add absolute paths to an archive, there is "-P" option, and man says it works only for creating archives: "Don't strip leading slashes from filenames when creating archives".

To extract absolute paths from the archive, you need to add the "-C /" option, and although the tool says "tar: Strip leading `/' from member names", it will still extract it in the right place because the paths become relative and -C puts them in the root.

However, if you add "-P" during the extraction (which is not mentioned in man), the "strip leading slashes" information disappears.

So if this message bothers someone, "tar -C / -xPf file.tar" will cleanly extract absolute paths from the archive ;)

The first field in the tar header is

    char name[100];
(See https://man.archlinux.org/man/tar.5.en )

So anything that will write an absolute path there, including literally opening it in a text editor and replacing the path by hand because that whole header is just fixed length ASCII with null terminated strings.

(I mean I assume the tar(1) command can do it too but you don't need that, the format is dead simple, if weird.)

It's a good exercise to open one of these files in hexdump or something to get a feel for what's really going on inside... but yeah, GNU tar has -P / --absolute-names to just create them with leading slashes.
Yeah, then I just hope I get the `mv` or `cp` right when I'm done, and don't end up with a directory full of files from the top part of the tar...