Hacker News new | ask | show | jobs
Rm does not permanently delete files (medium.com)
35 points by harshasrinivas 3681 days ago
17 comments

This functionality already exists in GNU coreutils, it's done by the shred(1) command. No need to install any extra third-party software, shred is already installed.

Also, as another commenter already pointed out, this kind of in-place overwrite is not guaranteed to work on SSDs, and it's also not guaranteed to work on filesystems with copy-on-write semantics. If you're really concerned with this, you should be doing full-disk encryption.

> This functionality already exists in GNU coreutils, it's done by the shred(1) command. No need to install any extra third-party software, shred is already installed.

That's only true for GNU-based userlands, not BSD-based ones.

> Also, as another commenter already pointed out, this kind of in-place overwrite is not guaranteed to work on SSDs

And if the system properly TRIMs it might not be necessary at all, though that greatly depends on the SSD.

> That's only true for GNU-based userlands, not BSD-based ones.

Sure, but installing (a subset of) GNU coreutils is probably going to pull in a lot fewer dependencies than this JavaScript command line tool. Plus, you can use ports, no need to mess with a seperate package manager (npm) and the associated package verification foibles.

> And if the system properly TRIMs it might not be necessary at all, though that greatly depends on the SSD.

The "depends on the SSD" is a big one. Various recent forensic papers have shown that it can take a while until a TRIM'd sector is actually erased by the firmware.

I still think that if this kind of thing causes worries, full-disk encryption is really the only sensible solution.

> Sure, but installing (a subset of) GNU coreutils is probably going to pull in a lot fewer dependencies than this JavaScript command line tool.

No objection here. Though the original note is right, you just pointed to the wrong tool:

> I still think that if this kind of thing causes worries, full-disk encryption is really the only sensible solution.

And full agreement there.

"That's only true for GNU-based userlands, not BSD-based ones. "

  RM(1)                     BSD General Commands Manual                    RM(1)
...

  -P          Overwrite regular files before deleting them.  Files are
              overwritten three times, first with the byte pattern 0xff,
              then 0x00, and then 0xff again, before they are deleted.
> That's only true for GNU-based userlands, not BSD-based ones.

"rm -P" with the same caveats as the GNU implementation.

for BSD-base systems there is srm (secure remove), which overwrites, renames, & trucates before unlinking.

https://www.freshports.org/security/srm/

Looking at man shred, apparently one can shred stdout.
"Shredding stdout" sounds like the highest complement for a command line utility.
Why not? One can shred any file descriptor. :-)

I bet one could even shred a network socket.

Is there any advantage to specifically disabling that?
And yet it doesn't "work":

  dtal@reepicheep:~$ cat /dev/urandom | shred -
  shred: -: invalid file type
Your command points stdin to the output of some random command. Even if you actually redirected stdout (pipe to cat, not from), it still wouldn't work, because cat's stdin isn't a file.

You need to point stdout to an actual file, like this:

    shred > /my/secret/file.txt
Why would you do this? At the end of a long batch of commands that all write to some temporary file opened on stdout. Not uncommon in shell-land.
This is the type of program that should be written in C, not in Javascript with 9 different dependencies...
This is the sort of program that has been written in C, and is probably already included on most systems.
I read comments before clicking the link and thought you were kidding... At least it doesn't run in the cloud and require files to be uploaded there.

Though, on second thought, this cloud service idea has some nice potential for evil :)

Atwood's Law: any application that can be written in JavaScript, will eventually be written in JavaScript.

// quoting https://blog.codinghorror.com/the-principle-of-least-power/

True. But no harm in getting it off the ground quickly before optimising. Git was written in Perl originally, and ported bit by bit to C. I think this shows it can be a good way to work -- prototype quickly, then optimise.
As pointed out elsewhere, this functionality already exists in Unix userland tools, like `rm -P` or (for GNU) `shred`.
This. You can bicker all day long about the "correct" language to write something in but at the end of that long day of going back and forth nothing actually got written. Better to go with what you know and prototype then weigh the benefits of rewriting/refactoring it in another language/toolchain.
Except, of course, when it is already written and people are just ignorant of tools that have been around for many many years.
Author himself stated that rm has a -P flag, that actually does the same thing more correctly and more securely:

>Files are overwritten three times, first with the byte pattern 0xff, then 0x00, and then 0xff again, before they are deleted.

Plus the -P flag is available on both GNU and BSD versions of rm. Somehow I fail to see the user-friendliness factor.

Edit: formatting

On OpenBSD, -P overwrites once. It used to be 3, like the author states, but was changed to one pass since multiple overwrites are pointless on mechanical disks. On SSDs you need ~20 passes of the entire drive to remove ~all data, so I doubt 3 passes of a single file on an SSD would accomplish what the person wants.

As an aside, it has never been demonstrated that multiple overwrites improve overwriting. In other words, it's never been demonstrated that data overwritten just once can be recovered. Until that happens I'll agree with other folks that multiple overwrites are a waste of time and electricity, and that FDE is a much more reasonable (not fool-proof, just reasonable) way to make data unavailable to unauthorized persons.

That actually seems to be a BSD extension not supported by GNU coreutils/fileutils.
> Plus the -P flag is available on both GNU ...

    [pritam@PritePad ~]$ cat /etc/lsb-release 
    LSB_VERSION=1.4
    DISTRIB_ID=Arch
    DISTRIB_RELEASE=rolling
    DISTRIB_DESCRIPTION="Arch Linux"
    [pritam@PritePad ~]$ touch t
    [pritam@PritePad ~]$ rm -P t
    rm: invalid option -- 'P'
    Try 'rm --help' for more information.
`man rm` does suggest looking at shred(1) in the SEE ALSO section
The funny thing is, this doesn't even call fsync(2) (which is available in Node [1]). So the file contents will likely actually remain on disk for some seconds to minutes thereafter, depending on the OS, file system, and their configuration.

[1] https://nodejs.org/api/fs.html#fs_fs_fsync_fd_callback

Or years if the OS crashes afterwards.

I also wonder what happens if the file is deleted before the zeros are flushed. Is there some implicit flush, triggered by metadata changes, which saves the day? No idea.

Neat idea, but defaulting to recursive force mode (-rf) is frankly a bit scary (and irresponsible). A great way to accidentally shoot your own foot off after you've put it through a meat grinder.

I pity the fool that tries to use this programmatically with a unset or null var, eg.

  skrub /$imfucked
If you're going to do it in a script you'll need to set those flags anyway, won't you?
> Skrub supports file globbing

Why? This is already provided by the shell.

Defaulting to "rm -rf *" makes me imagine what could possibly go wrong.
Pretty sure that was just an example to demonstrate that skrub uses the -rf options by default.
Looking forward to skrub-as-a-service
>> Before removing your files, skrub first floods the file with zero-filled bytes.

Should have used /dev/random, and done it 7 times, then you could sell this to enterprise customers!

When you decided to write this in JavaScript you should have thought if you should instead of if you could. ;P
We are JavaScript. We are the Collective. You will be assimilated. Resistance is futile.
If I've understood correctly when I read about this before, programs that overwrite before unlinking do not work for an SSD. The SSD probably will not write the data to the same location as the old file. Instead, use fstrim to have the disk reclaim all free space. After that it's supposed to be impossible to recover.

https://en.m.wikipedia.org/wiki/Trim_(computing)

> Instead, use fstrim to have the disk reclaim all free space. After that it's supposed to be impossible to recover.

That's not quite true. TRIM simply tells the SSD that the corresponding block is not in use anymore, it doesn't tell the SSD what to do with it.

The controller will usually unmap the physical block and schedule it for erasure but usually not erase it immediately unless it doesn't have any free block to remap. And it will return zeroes if the block is read.

The data is recoverable at that point (until the block is actually erased) and can remain so for a fairly long time[0] if the attacker can either bypass the SSD controller or can physically access the raw flash memory.

[0] depending on storage pressure and the exact make and recycling strategy of the SSD

Why is Skrub implemented in JavaScript? Do I need an entire nodejs stack setup to use it?
The old hard-drive lore was that files may still be recoverable until the contents have been written 7 times - and the government had tools that could recover previously overwritten data.

Is this still true with SSDs?

FWIW: This seems to mostly be lore. See http://www.nber.org/sys-admin/overwritten-data-guttman.html

Note also "Since writing the above, I have noticed a comment attributed to Gutmann conceding that overwritten sectors on "modern" (post 2003?) drives can not be read by the techniques outlined in the 1996 paper, but he does not withdraw the overwrought claims of the paper with respect to older drives."

(the comment is at http://seclists.org/bugtraq/2005/Jul/464)

As for SSD's, it's harder to say because of wear leveling and bad sector replacement. They internally have their own translation layers, and so the overwritten data may not end up where the original data was (this is itself a complicated topic. http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pag... is a reasonble intro)

However, one neat thing about them is that there are plenty of SSD's that do hardware encryption by default. When they get init'd, they set the key. You can reset the key, and then, pretty much, assuming the SSD's have not been backdoored by our security agencies, it doesn't matter what you do ;)

This assumes you just want to destroy all the data though, not just some of it.

> and the government had tools that could recover previously overwritten data.

That wasn't true. But, since we can't prove it's not true precaution says you should over-write drives a few times with pseudo random data.

SSDs are a bit more worrying because end users don't have full control.

But with physical drives or SSDs if the data is that important you should be looking at physical destruction, rather than just over writing.

Unlikely, because while hard drives use magnetic storage (where a few magnetic domains may still point in the direction of the previous magnetization, SSDs store bits in discrete logic gates - meaning that even if there is a trace, the chip would need to be disassembled and the gates examined with an electron microscope one-at-a-time. Furthermore, it's less likely that data can survive a single rewrite, given the mechanism used (instead of bytes being changed in-place, whole blocks are erased completely and then re-written - which means the erasing is less delicate because it doesn't need to avoid flipping neighboring bits).
This is all irrelevant because there is no way to have a sector on SSD erased for sure. Overwriting relocates, TRIM marks for future deletion, Secure Erase has the unpleasant side-effect of nuking all data and, as typically implemented, also doesn't remove data but only changes internal encryption keys.
Do you remember the dd challenge where they offered money to any company that could recover a drive that had been zeroed over once with dd? Nobody ever took it.
Javascript for a core program seriously...
`srm` works great.
When using the terminal command rm (or DEL on Windows), files are not actually removed.

Yes they are. As a user, I see a file, type rm file, then it is gone. The file has been removed.

Yes those parts of that file are possibly recoverable back into the original file, even without much work, but the file has been removed.

The file, yes, but not the file contents. Worst of all, if you opened the file recently with any process, there is a pretty good chance that the file-descriptor can be found in:

  /proc/
somewhere.

And with that in mind you can use a combination of lsof, grep, sed or any other tool to still read the file as it was.

You are correct that the file contents may remain in memory.

Neither this tool nor GNU shred nor BSD `rm -P` can do anything about it.