Hacker News new | ask | show | jobs
by degenerate 1217 days ago
Nearly all my personal photos were encrypted by the helprecover@foxmail.com ("HELP") variant of Phobos. I've been holding onto the encrypted copies for a while in hopes that some people were working on a crack, and I'm excited to read this update.

Sidecar question: when automating your backups, what's a good way to make sure your rolling backups aren't simply backing up malware-encrypted files? I found out too late that all my backups were encrypted. I only had 1 week retention to save space. Is longer retention and manual checks the only sane strategy? Or has some backup software built in sanity detection for crypto attacks by now?

16 comments

Don't do simple rolling backups, use something with deduplication like borg backup or ZFS/btrfs if you want to do it at the FS level. The backup size should not increase by much more than the actual size of any new files, so if suddenly, you need twice as much backup space because all your files seem to have changed, you should get suspicious.
Also ensure your client does not have access to the backup server share so that ransomware can't encrypt backups on a network drive etc.

My backup solution (backuppc/other syncs + zfs + sanoid/syncoid plus offsite server with zfs) means the backup server pulls files from the clients using backuppc/rsync. The backup server volume is zfs snapshoted regularly using sanoid. The offsite server pulls these from the backup server via syncoid/zfs send.

I'm not using rsync.net since I have my own infrastructure, but would definitely choose it as the offsite server if needed.

Yes this. I use Borg via ssh to an off-site server. With the proper ssh config (force-command, no pty, no forwarding etc) you can lock it down pretty well, especially since you can add an "append only" switch to the serve command that will refuse any modifications or deletions to existing snapshots.
Seconded, I have an external 2tb HDD that stores 1 years worth of daily backups from my 256gb (~180gb used at any time) laptop hard drive. My backup script (https://gist.github.com/Jeffrey-P-McAteer/7d4b9052825914b5e0...) takes maybe 30 minutes for a full backup, 5 minutes for most deltas. Files which are the same get hard-linked to the previous days backups, new files are copied over and content-de-duped by btrfs.
If nothing else, external hard drives are cheap and robust enough that I think more people should invest in having an offsite backup. Annually make a full backup, write the date on the outside, and leave it at the parents house. Make that your family holiday ritual.
As of recently, my setup for backups consists of 2x4 TB HDDs (from different manufacturers) with BTRFS in RAID 1, plugged into a small 2-drive USB3 docking station. With both checksumming and mirroring, feels pretty safe from HW-failure/bit rot standpoint (if one disk fails, you can still mount the other in "degraded" mode).
Agreed. I'll shill rsync.net (no affiliation, just a happy customer) and their ZFS VM backup service. It's basically just a lightweight freebsd VM with a big ZFS volume attached, so you can `zfs send` incremental backups to it, and they support meta-snapshotting of your backup machine on their end. I wrote https://github.com/wyager/zfs-backup to manage my automatic incremental backups, and there are a number of other tools like this.
One technique would be to place unchanging bait files that you pre-check before allowing the backup to proceed.
That’s a nifty and cheap idea. Now I am wondering if I should make the standard juicy targets (eg ~/Documents, .config, .ssh) complete decoys and put all of my real data just off to the side. Could still be hit by a generic attack, but targeted data extraction attempts would initially fail.
Hmmm, settings things like `~/.ssh` to non standard locations too would probably block a lot of the standard dependency-chain-malware coming around as well.
My backups got attacked by ransomware, but I only caught it about 6-8 months after it occurred. Thankfully, my backup drive is copied to another backup which never deletes files, only copies them, so the renamed files that were encrypted were eventually copied over to my second drive, but the originals remained. The attacker wasn't aware of the second drive.
I do an off-line, off-site backup once a year that never gets overwritten. ZFS snapshots can help here, too, for online stuff, especially if you have monitoring about changeset size.
zfs server where you can only ssh into it with 1 user, and that user cannot zfs destroy, only zfs receive
I use borgbackup to a server and once a month upload it to backblaze b2 with a 90 day retention policy on the bucket. The policy doesn't let you modify or delete the files for 90 days even if you have access to the account. Costs something like 5$ for the 3x500GB of backups I have.
I use rsync manually, and always with "--dry-run" first. Unless the ransome-ware is smart enough to rot my backup very, very, very slowly, I should be able to detect any problem by simply reading which files are to be overwritten. 99.99% of the files I back up rarely change.

I also have most of my non-sensitive data on Onedrive, which keeps old versions of files.

I store two weeks of daily snapshots, and then 12 months of monthly snapshots. It gives me a year of pretty good coverage for only about 2x the cost of just two weeks of backups.
> when automating your backups, what's a good way to make sure your rolling backups aren't simply backing up malware-encrypted files?

Maybe you could check the level of entropy (measure of randomness) of files before backing up - very high entropy could suggest encrypted data?

This is good. Some antivirus programs run this check, but some ransomware adapted by encrypting 16-byte AES blocks every so often in the file, so that the file becomes useless without entropy increasing too much.

Also, JPEG, PNG, .jar, .xlsx, etc. are already compressed, so pretty high entropy to begin with.

As others have pointed out, the growth rate of your de-duplicated backup size is probably the best way to detect ransomware.

> what's a good way to make sure your rolling backups aren't simply backing up malware-encrypted files?

Compute checksums. Also, if storing diffs check out how many files the back-up think changed. All of your photo library getting re-uploaded should be a red flag.

I do this using an `rsync --dry-run` with `--stats` enabled, you can get the number of changed/deleted files from the stats and then decide whether you want to proceed without `--dry-run`. This is for off-site backups. For on-site, I prefer ZFS Snapshots.
I use restic which has very flexible retention policies. Rather than retain simply "x days", one can have it "keep the last x hourlies" all the way up to "keep the last x annuals", and XOR many criteria in one policy. Since it uses snapshotting, compression and deduplication it's very storage-efficient. They also offer a simple REST server which allows one to backup with append-only permissions, which can alleviate a ransomware attack.
Sorry that happened to you. Using a de-duplicating backup solution (such as rdiff-backup or restic), will let you keep daily increments indefinitely with very little overhead.
> Nearly all my personal photos were encrypted by the helprecover@foxmail.com ("HELP") variant of Phobos.

I can't assist you with recovery, and without a lot of logs and forensics data (or a significant performance improvement), the described method is likely unfeasible. But I'll try to find a matching sample and let you know if it's vulnerable.

> Sidecar question: when automating your backups, what's a good way to make sure your rolling backups aren't simply backing up malware-encrypted files?

Lots of good responses, I like incremental backups without ovewriting anything (supported OOTB by all copy-on-write filesystems, like ZFS or BTRFS). Not sure how to configure this on Windows.

I worked in backup software for 22 years until a few years ago. My suggestion is to make a backup and store it for long term on an external drive. The to the cloud full followed incremental backup daily so you can go back to any point along the way at any time. Currently I use Acronis and it works well. I backup to their cloud to avoid having my local backups also encrypted preventing restoring in the event of failure or malware. Good backup strategy would not overwrite old backups until you must and backup regularly, so you don't lose any more data than you can afford to lose. Remember recovery time as well.
don't backup file names. Backup checksums.
I agree. And for some stuff you get cryptographic checksums for free.

Backup of Git repositiories:

    ... #  git fsck --full
    error: unable to unpack contents of .git/objects/a2/cf1a9631658799733f43c3b3f0a799696a4b21
    error: a2cf1a9631658799733f43c3b3f0a799696a4b21: object corrupt or missing: .git/objects/a2/cf1a9631658799733f43c3b3f0a799696a4b21
Oops... No matter if it's a malware, the lack of ECC which by bad luck induced a bit flip that wasn't detected (on an otherwise okay Git repo) or a disk failing, it's trivial to detect if the repo is corrupted.

Same for my ripped archive of Audio CDs. The rippers save lots of information and the rips are bitperfect, cross checked with other people's rips' checksums. And the checksums are all there.

For family pictures, I add a checksum to the pictures myself.

Backups aren't really backups until they've been verified :)

People normally say don’t store binaries in git. Is this a big issue if the files don’t change very often? From what I understood the biggest problem is they don’t diff well. With photos not changing very often, can it work?

Anyone tried using git for 500G of photos?

I would love to if it worked, I have my photo collection spread out on multiple computers and merging the edits to the master backup is always a pita. “Was this file removed from copy A or added to copy B”? All those problems just solve themselves with a clear DVCS git history.

Not having the possibility of ever removing photos, to free up space, is of course another issue of git.

Git simply wasn't designed for that and so the key issue with storing binaries in it is what you mentioned last - that the way git works, a full clone has the full history of all the files. Deleting a file in git then doesn't actually delete the file from git history, so a fresh full clone of 500G of photos isn't going to be 500G, it's going to be that, times however many copies exist in history. A shallow clone solves that, and shallow clones supposedly work better these days in latest version of git, but fundamentally you're using a hammer on screws, as it were.

If you're open to new tools, git annex is what you're looking for. The other two options are Subversion, which has some DVCS features these days, or Perforce Helix Core (paid), though I can't vouch for it as I've never used it.

We've been working on some open source tooling called "oxen" that was built for large datasets of images, video, audio, text etc. We wanted to solve the exact problem you're flagging here with git.

Feel free to check it out here https://github.com/Oxen-AI/oxen-release#-oxen would love any feedback!

I guess that 500GB repository would be barely usable.

You should check out Git LFS if you want to do that, as it sounds like a good idea in the first place!

what do you mean adding checksum to the picture, do you add the checksum as a filename suffix eg IMG0001_<checksum>.jpg, something like that? Or do you tuck it into the exif data and have a tool that computes the checksum of the file minus the checksum part.
Yup exactly just adding a suffix. I'm not only backing .jpg files. For example I also backup a few screenshots (some are in .png and some are in .webp format).

So I don't care about the different pictures (or short family movies) format.

I just wrote some Clojure / babashka code to do that. I also truncate the checksum so that the filename doesn't become gigantic: it's not sensitive content, it's just to detect corruption.

Then I can use another computer and generate, say, all the thumbnails of the pictures and do a quick eyeball verification. If it looks correct, later on I can just automatically have the checksums verified.

Funnily enough I got a few old JPG pictures who were corrupt but I ended finding the correct version on older backups.

Checksum then helps too: otherwise you have two files with the same name (say on different HDD), but only one is correct and you don't know which one without manually opening them.

It's not super advanced and maybe a bit overkill but it's not complicated and works fine for my use case.

P.S: I take it another way would be to use a fs that use content-based addressing or does checksumming for me.

yah ZFS is supposed to alert somehow, I've been curious about the actual end user experience for that workflow and how it feels. Restoring from backup for disturbed crcs is excellent, I've been hoping to get into that action myself once I discovered various low priority files had bit rot on them.
I've been playing around with beyond compare snapshot. I've done whole drive snapshots. I'm pretty close to running a diff to see how things have evolved on my drives and see where all the file system activity has shifted around. The files sizes are pretty small, in the MB range, maybe 6 or 10MB I forget.

https://www.scootersoftware.com/v4help/index.html?snapshots....

For something like photos you could just store every revision.
Don't save just up to 1 week old snapshots. Also save a 2 week old, 3 week old and 1 month old one.