Hacker News new | ask | show | jobs
by d332 704 days ago
This inspired me to read up on the low-level details of CD structure. I'm curious if anybody scanned an entire CD and shared the results, so that we could work with a raw image of disc that contains all its quirks, as opposed to the typical .iso format?
3 comments

It's really difficult. Unlike floppy disks, where you tell the drive to seek and get back raw magnetic pulses (so you can produce raw flux images), or hard disks where you tell the drive to read an arbitrary sector and get a blob of data (so you can produce sector-level images), the protocol for talking to a CD ROM involves asking for track/sector addresses, which means you have to trust the drive to interpret all the track metadata and error-correction for you - you generally can't just dump the "raw" data and do the interpretation yourself.

That's why the most robust CD image format is the BIN/CUE format. The BIN file contains all the sectors the drive allows us to read, the CUE file contains the disc metadata as interpreted for us by the drive firmware.

There are some drives which support extra "raw read" commands, but they're incredibly rare and consequently in great demand by CD preservation projects like redump.org.

Some people have used the contents of BIN/CUE data to reconstruct what should actually be on the disk, but that's not quite the same thing. Here's a great explanation of the CD structure in all its complexity:

https://github.com/higan-emu/emulation-articles/tree/master/...

Even BIN/CUE is not enough. It cannot store subchannel data like CD+G and is only able to hold a single session which breaks bluebook CDs with audio and data.

We do not currently have a widely supported CD standard for storing data from a CD that can properly hold all data. Aaru [0] is close, but still has to output back to other formats like BIN/CUE to use the contents of the disc.

[0] https://www.aaru.app/#/

Apparently makemkv forum members created some patched firmware that lets you raw read BRs for the sake of extracting metadata that’s intentionally hidden for DRM. Though I’ll have to recheck my understanding since you’re saying you can’t actually raw read disks anyway
Audio CDs were never ripped/transferred as ISO files. ISO-9660 is a filesystem that came years later, and Redbook audio CDs simply do not contain files.

If you want to look at the structure of a whole audio CD, then one way is to rip it with a decent tool (perhaps cdrdao or EAC) and generate a bin/cue file pair as an output.

But that's not my goal. I'd like to be able to observe every grove, the physical encoding of data, and see if I could implement decoding from scratch. First problem is though that I don't know how to get a microscopic image of the disc.
You don't need a microscopic image of a disc to do that; a two-dimensional photograph is of essentially no advantage here.

All you need is the unmolested data from that disc. The data is arranged on a singular spiral groove starting from the center and slowly winding its way towards the outside.

The data is completely linear: It begins at the beginning, and continues to the very end without interruption. This is all akin to (although opposite of) how a single-track vinyl record is physically laid out. The entire CD -- whatever it contains -- is just a continuous string of pits and lands.

And to observe that string as it appears on a real disc, all you need to get started is a regular old-school CD player and some appropriate data acquisition gear, and maybe an oscilloscope to help figure out what you're looking at.

The optics and basic motor controls are already solved problems, and it doesn't even have to be particularly fast data acquisition gear by today's standards to record what is happening.

Look into the Domesday Duplicator project for Laserdiscs as an example of how what ssl-3 is talking about can be done using a high sample rate input. That exact process is possible and with enough storage and processing power can be used to get the most "low level" access to the data. It is not for the faint of heart though, and can take around 1TB of storage and hours of CPU time to process full movies in this way, I know because I've done it.

I believe I've seen there is work being done to attempt this on CDs but it would have still been in the exploratory phases and not yet ready to start archiving with. It might seem like overkill to do this to something meant to be digitally addressed but I've experienced enough quirks with discs and drives when ripping that I would 100% be willing to switch over to a known complete capture system to not have to worry about it anymore. Post process decoding also allows for re-decoding data later if better methods are found.

The "unmolested data" would still have undergone error correction though, wouldn't it? I don't think a bin/cue rip would contain the redundant stuff, which GP seems interested in, nor the subcodes (of which some are represented in the cue file, while the bin file is PCM audio).

And at the risk of taking us well beyond the rainbow books, I'll just leave this here: https://www.psxdev.net/forum/viewtopic.php?f=70&t=1266

There is a layer betwixt the optical reflection and the audio output that exists only as raw signals, before any molestation/error correction occurs.

There cannot not be this layer.

(And with a sufficiently-old-school CD player, it is probably not even challenging to get to it. The less-integrated the parts are, the better.)

Ah, I see. So what kind of capture hardware could read from that point? I assume it's a digital signal taking the form of 2-voltages, flipping on the order of 3.6 MHz (16 billion pits to read over 74*60 seconds). With Red Book audio at 1.4 Mbps, more than half of the raw data must be devoted to things like redundancy and other non-PCM stuff, if my interpretation that pits==bits isn't far off.

Aside: is your username inspired by Secure Socket Layer or Solid State Logic?

Not necessarily. It depends if you're extracting data+subchannel data or corrected track data only.
You might do well enough with https://en.wikipedia.org/wiki/Cdparanoia without needing use different hardware to scan the disc. Instead it relies on the CD drive's ability to report on inaccuracies in keeping in sync with the grooves.
I wonder if you could just tear the controller out of a CD/DVD drive and build a new one from scratch, kind of like the new floppy controllers being used now to read the raw magnetic data. You could just command the head to move to the center, find the beginning of the data and just keep reading until you hit the buffers.
Sorta, kinda? It's a bit of a different game.

Floppies (most of them, anyway) have fixed track widths, and these tracks are arranged cylindrically, and these cylinders align with the steps of the stepper motor that is used to actuate the head assembly.

It's relatively easy, with the right ratio betwixt step advancement and track width, to get the head moving properly on a new implementation of a floppy controller. Want to read track 1? Step to the head N times to reach track 1 from wherever it started, and read it. Next, want to read track 33? Step the head N times to track 33, and read that.

But tracking the spiral groove of a CD is a very different problem to solve. Steps tend to lose their meaning. Instead of electromagnetic steps, it involves 3 different laser beams: Two to continuously keep the head centered where it needs to be on the ever-changing groove using a servo feedback loop, and a third to read the data from the pits and lands from the middle of that groove.

Is it do-able? Sure! People with far less advanced tech than we on HN might have laying around did it 40+ years ago.

It's just a very different nut to crack than reading a floppy is, even if the mechanical and optical bits are recycled.

(And that's just head positioning. The pits and lands still needs to be read, and those reflect back from the disc as optical phase shifts, not as changes in magnetic polarity and/or amplitude.)

Why? You can extract raw data and raw subchannel data directly from a CD/DVD drive. This isn't the case with how floppy drives work.
The "why" was covered in a parent comment: https://news.ycombinator.com/item?id=40923030
Coming back to this, having read some of the (great!) replies, I'm going to go out on a limb and say that in theory, this sounds possible, and fun, but highly impractical. I'll assume that by "scan" you mean a high end "flatbed scanner" optical scan which would return a 2D bitmap.

It's impractical because the resolution required to retrieve the data "flatbed scanner style" is comically high, perhaps 50k dpi, far beyond the capability of any commercial unit and well into scanning microscope territory. Sure, from my understanding, it looks technically possible. But it would be a very significant and costly project just to assemble the image in the first place. Even if you had that, the resulting file would be hilariously huge (something like 122GB), extremely difficult to work with, and you would be starting from scratch implementing some kind of visual pathfinding helical decoder to painstakingly unravel the linear coil of data the scan just sort of blatted into two dimensions.

It's a cool idea. But it's comically, exponentially harder than just using the equipment as intended to just read the laser returns off the disk directly, into a far, far more easily dealt with format.

I'm adding that CD scan to my list of things I'd like to do if I ever get really rich.