Hacker News new | ask | show | jobs
by rep_lodsb 55 days ago
>All of those have control data in the same stream under the hood.

Not true. For most binary protocols, you have something like <Header> <Length of payload> <Payload>. On magnetic media, sector headers used a special pattern that couldn't be produced by regular data [1] -- and I'm sure SSDs don't interpret file contents as control information either!

There may be some broken protocols, but in most cases this kind of problem only happens when all the data is a stream of text that is simply concatenated together.

[1] e.g. https://en.wikipedia.org/wiki/Modified_frequency_modulation#...

1 comments

The header and length of the payload are control data. It's still being concatenated even if it's binary. A common way to screw that one up is to measure the "length of payload" in two different ways, for example by using the return value of strlen or strnlen when setting the length of the payload but the return value of read(2) or std::string size() when sending/writing it or vice versa. If the data unexpectedly contains an interior NULL, or was expected to be NULL terminated and isn't, strnlen will return a different value than the amount of data read into the send buffer. Then the receiver may interpret user data after the interior NULL as the next header or, when they're reversed, interpret the next header as user data from the first message and user data from the next message as the next header.

Another fun one there is that if you copy data containing an interior NULL to a buffer using snprintf and only check the return value for errors but not an unexpectedly short length, it may have copied less data into the buffer than you expect. At which point sending the entire buffer will be sending uninitialized memory.

Likewise if the user data in a specific context is required to be a specific length, so you hard-code the "length of payload" for those messages without checking that the user data is actually the required length.

This is why it needs to be programmatic. You don't declare a struct with header fields and a payload length and then leave it for the user to fill them in, you make the same function copy N bytes of data into the payload buffer and increment the payload length field by N, and then make the payload buffer and length field both modifiable only via that function, and have the send/write function use the payload length from the header instead of taking it as an argument. Or take the length argument but then error out without writing the data if it doesn't match the one in the header.

From your previous post:

>It's user data in JSON in an HTTP stream in a TLS record in a TCP stream in an IP packet in an ethernet frame. Then it goes into a SQL query which goes into a B-tree node which goes into a filesystem extent which goes into a RAID stripe which goes into a logical block mapped to a physical block etc. All of those have control data in the same stream under the hood.

It's true that a lot of code out there has bugs with escape sequences or field lengths, and some protocols may be designed so badly that it may be impossible to avoid such bugs. But what you are suggesting is greatly exaggerated, especially when we get to the lower layers. There is almost certainly no way that writing a "magic" byte sequence to a file will cause the storage device to misinterpret it as control data and change the mapping of logical to physical blocks. They've figured out how to separate this information reliably back when we were using floppy disks.

That the bits which control the block mapping are stored on the same device as a record in an SQL database doesn't mean that both are "the same stream".

> There is almost certainly no way that writing a "magic" byte sequence to a file will cause the storage device to misinterpret it as control data and change the mapping of logical to physical blocks.

Which is also what happens if you use parameterized SQL queries. Or not what happens when one of the lower layers has a bug, like Heartbleed.

There also have been several disk firmware bugs over the years in various models where writing a specific data pattern results in corruption because the drive interprets it as an internal sequence.

I distinctly remember bugs with non-Hayes modems where they would treat `+++ATH0` coming over the wire as a control, leading to BBS messages which could forcibly disconnect the unlucky user who read it.

In this particular case, IIRC Hayes had patented the known approach for detecting this and avoiding the disconnect, so rival modem makers were somewhat powerless to do anything better. I wonder if such a patent would still hold today...

https://en.wikipedia.org/wiki/+++ATH0#Hayes'_solution

What was patented was the technique of checking for a delay of about a second to separate the command from any data. It still had to be sent from the local side of the connection, so the exploit needed some way to get it echoed back (like ICMP).

More relevant to this bug: https://en.wikipedia.org/wiki/ANSI_bomb#Keyboard_remapping

DOS had a driver ANSI.SYS for interpreting terminal escape sequences, and it included a non-standard one for redefining keys. So if that driver was installed, 'type'ing a text file could potentially remap any key to something like "format C: <Return> Y <Return>".