|
This is a cool idea. Please don't take the below as gratuitous negativity, just a reminder that these are hard problems for which there are no general solutions. The README says it was tested on ZFS, but I doubt its utility in real-world deployments. I don't know of anyone who has significant data in a ZFS pool that isn't one or more of: raidz, compressed, encrypted, or embedded_data. raidz implies that logical blocks aren't allocated as single physical blocks, but instead striped across multiple drives. Finding the SBX magic isn't enough to get you the rest of the block, but the checksum might (but, given that's it's CRC16, probably won't) let you try appending blocks from other disks to find the remainder of the block. Transparent compression prevents you from identifying the magic header on each block, unless you decompress every disk sector that could have data (which is certainly feasible, but complicates recovery if you don't know which compression was in use, and zfs supports at least 3 kinds, and pools will generally have at least 1 in use whether compression is on or not). Encryption (present in Oracle ZFS) means there's no plaintext data to recover. embedded_data is a feature flag (and on by default in supporting versions of zfs) that packs blocks into block pointer structs when the amount of data is small. I can easily imagine the final block of an SBX, which may be mostly padding, getting compressed into one of those block pointers, which itself may be embedded in a larger structure which is part of an array that's compressed by default. That array is also probably long enough the compressed stream takes multiple blocks, and you may have lost some of the early ones, making the rest of it unrecoverable. |