| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by taminka 115 days ago
	this is amazing, counter to what most ppl think, majority of memory bugs are from out of bounds access, not stuff like forgetting to free a pointer or some such

4 comments

Night_Thastus 115 days ago

Personally, as someone in C and C++ for the last few years, memory access is almost never the root bug. It's almost always logic errors. Not accounting for all paths, not handling edge cases, not being able to handle certain combinations of user or file input, etc.

Occasionally an out-of-bounds access pops up, but they're generally so blindingly obvious and easy to fix that it's never been the slow part of bug fixing.

lelanthran 115 days ago

I've been programming for long; the ratio of memory errors to logic bugs in production is so low as to be non-existent.

My last memory error in C code in production was in 2018. Prior to that it I had a memory error in C code in production in 2007 or 2008.

In C++, I eventually gave up trying to ship the same level of quality and left the language altogether.

vlovich123 115 days ago

The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc. This doesn’t mean that logic bugs are less common, it just means that memory safety issues are the easiest way to get access.

As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.

Keep in mind the data for this comes from popular projects that have enough attention to warrant active exploit research by a wide population. This is different from a project you wrote that doesn’t have the same level of attention.

lelanthran 115 days ago

> The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc.

You are misremembering the various reports - the reports were not that 80%[1] of issues were due to memory errors, but more along the lines of 80% of exploits were due to memory errors.

You could have 1000 bugs, with 10 of them being vulnerabilities, and 8 of those 10 being due to memory errors, and that would still be in line with the reports.

> As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.

Payments processing, telecoms and munitions control software.

Of those, your explanation only applies to Telecoms; payments processing (EMV) was basically a constant stream of daily attacks, while munitions are live, in the field, with real explosives. We would've noticed any bugs, not just memory error bugs with the munitions one.

--------------------

[1] The number wasn't 80% IIRC, more like 70%?

vlovich123 114 days ago

Sorry, I didn’t misremember but I wrote down without proof checking (see another comment where I got it right). I did indeed mean 80% of security vulnerabilities are caused by memory safety issues.

For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto code in which case again you’re less likely to encounter bugs in your code because it’s difficult to find a path to injecting unsanitized input.

For munitions there’s not generally I/O with uncontrolled input so it’s less likely you’d find cases where you didn’t properly sanitize inputs and relied on an untrusted length to access a buffer. As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2

lelanthran 114 days ago

> For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto

EMV terminals. No Java involved.

> As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2

Look, first you commented that it's not possible for nontrivial or non-networked devices, now you're trivialising code that, if wrong, directly killed people!

All through the 80s, 90s and 2000s (and even now, believe it or not), the world was filled with millions and millions of devices programmed in C, and yet you did not live a life where all the devices around you routinely crashed.

Crs, Microwaves, security systems... they didn't routinely crash even though they were written in C.

thomasmg 115 days ago

Yes. The problem is that most memory errors (out of bounds + use after free etc.) result in a vulnerability. Only a minority of the logic errors do.

For operating systems kernels, browsers etc, vulnerabilities have a much, much bigger impact than logic errors: vulnerabilities need to be fixed immediately, and released immediately. Most logic errors don't need to be fixed immediately (sure, it depends on the issue, and on the type of software.)

I would probably say "for memory unsafe languages, 80% of the _impact_ is due to memory vulnerabilities"

taminka 115 days ago

logic errors aren't memory errors, unless you have some complex piece of logic for deallocating resources, which, yeah, is always tricky and should just generally be avoided

woodruffw 115 days ago

"Majority" could mean a few things; I wouldn't be surprised if the majority of discovered memory bugs are spatial, but I'd expect the majority of widely exploited memory bugs to be temporal (or pseudo-temporal, like type confusions).

Retr0id 115 days ago

I think UAFs are more common in mature software

q3k 115 days ago

Or type confusion bugs, or any other stuff that stems from complex logic having complex bugs.

Boundary checking for array indexing is table stakes.

michh 115 days ago

table stakes, but people still mess up on it constantly. The "yeah, but that's only a problem if you're an idiot" approach to this kind of thing hasn't served us very well so it's good to see something actually being done.

Trains shouldn't collide if the driver is correctly observing the signals, that's table stakes too. But rather than exclusively focussing on improving track to reduce derailments we also install train protection systems that automatically intervene when the driver does miss a signal. Cause that happens a lot more than a derailment. Even though "pay attention, see red signal? stop!" is conceptually super easy.

q3k 115 days ago

I'm not saying it's not important, it is. I just don't believe that '[the] majority of memory bugs are from out of bounds access'. That was maybe true 20 years ago, when an unbounded strcpy to an unprotected return pointer on the stack was super common and exploiting this kind of vulnerabilities what most vulndev was.

This brings C one tiny step closer to the state of the art, which is commendable, but I don't believe codebases which start using this will reduce their published vulnerability count significantly. Making use of this requires effort and diligence, and I believe most codebases that can expend such effort already have a pretty good security track record.

vlovich123 115 days ago

The majority of security vulnerabilities in languages like C that aren’t memory safe are due to memory safety issues like UAF, buffer overflows etc etc. I don’t think I’ve seen finer grained research that tries to break it out by class of memory safety issue. The data is something like 80% of reported vulnerabilities in code written in these languages are due to memory safety issues. This doesn’t mean there aren’t other issues. It just means that it’s the cheapest exploit to search for when you are trying to break into a C/C++ service.

And in terms of how easy it is to convert a memory safety issue into an exploit, it’s not meaningfully much harder. The harder pieces are when sandboxing comes into play so that for example exploiting V8 doesn’t give you arbitrary broader access if the compromised process is itself sandboxed.

random_mutex 115 days ago

There is use after free

eecc 115 days ago

Majority. Parent said majority

IshKebab 115 days ago

Exactly. Use after free is common enough that you can't just assert that out-of-bounds is the majority without evidence.

taminka 115 days ago

actually you may be right, according to project zero by google [1], ~50% is use after free and only ~20% for out of bounds errors, however, this is for errors that resulted in major exploits, i'm not sure what the overall data is

[1] https://projectzero.google/2022/04/the-more-you-know-more-yo...