Hacker News new | ask | show | jobs
by dlor 1073 days ago
It's somewhat disheartening as a software developer focused on security that the top four elements are still:

* Out-of-bounds Write

* Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

* Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')

* Use After Free

10 comments

The gap from knowing what a CWE is and actually knowing, on code level, how it manifests and how you avoid these things is very large. Given how much the software industry has grown in the past 10 years it's not particularly surprising.
> and actually knowing, on code level, how it manifests and how you avoid these things

You avoid them by using tools that make it difficult or impossible to introduce such vulnerabilities to begin with. Such as modern, memory safe programming languages.

For many decades, carpenters have been educated about table saw safety. But what finally stopped thousands of fingers getting chopped off every year was the introduction of the SawStop, and similar technologies.

Safety is a matter of using the right tools, not of "taking better care".

> For many decades, carpenters have been educated about table saw safety. But what finally stopped thousands of fingers getting chopped off every year was the introduction of the SawStop, and similar technologies.

Afaik the technology isn’t widespread and there are still 10s of thousands of injuries per year.

You mean tecnhology like bounds checking, invented during the 1950's decade, with the creation of Fortran, Lisp and Algol, and every other language derived from them, with exception of C, C++ and Objective-C?
And why the whole world wrote so much code in C, C++ and Objective-C when bound checking existing long before these languages without boundcheck?
It started like this,

"Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own."

-- https://www.bell-labs.com/usr/dmr/www/chist.html

Then source tapes with an almost symbolic license price for its time, and a commentary book did the rest.

with bounds checking, out of range index still trigger exception or runtime error. Many of them results in DoS.
Much better than silent data corruption.

Then there is the whole issue of making it more interesting to look elsewhere instead.

When a door is locked I can still break in by throwing a rock to the window, yet most people do lock the door nonetheless, while most thieves only bother to break the window if there is anything actually valuable in doing so.

Yeah at least in the US, it looks like tablesaw accidents that put people in the ER are about as common as they were 15 years ago. I have a buddy who just lost 6 months work because of a tablesaw accident.
Also SawStop doesn't prevent kickback, one of the other major sources of injury from a table saw.
Wow! SawStop is incredible tech. the blade stops within 5ms. That's insane.
Those two things aren't mutually exclusive. I'll bet a non-trivial number of XSS and SQL injection vulnerabilities came from people disabling input and output sanitation on solid frameworks and libraries because they didn't know why they shouldn't. Tools won't solve all of your problems-- you need knowledge, diligence, and tools that make doing the right thing easy.
> I'll bet a non-trivial number of XSS and SQL injection vulnerabilities came from people disabling input and output sanitation on solid frameworks and libraries because they didn't know why they shouldn't.

I will take this bet.

Searching Google for disabled sanitation "vulnerability", the first two hits are articles admonishing developers to not do it, and the third is a CVE, CVE-2023-1159, from a month ago that affects WordPress installations on which the developer disabled unfiltered_html, which is it's built-in sanitation functionality.
Memory safety won't stop you writing SQL queries or dynamically generating HTML that accepts unsanitised user input.
You're right. Those things are stopped by other tools, such as query builders and web frameworks.
>Those things are stopped by other tools, such as query builders and web frameworks.

No. All tools can be used with an improper attitude which leads to the creation of weak points.

The proper way is to have a deep understanding of the role of design rules.

A programmer who does not pay attention to design (very basic principles of the design process) can create a good game, and even if this game contains weaknesses the risk related isn't a reason to not use it. The same programmer when creating critical infrastructure software is a source of potential nightmare.

Unfortunately, software business accepts such specialists for projects both of kinds. Why? Who knows? Perhaps because of legal regulations? Why when an engineer designs a car they don't try to "Move fast and break things"?

> Why when an engineer designs a car they don't try to "Move fast and break things"

They do when they design submersible or rockets

C is the table saw of programming. C++ is the band saw.
I've used both saws and both programming languages. I still don't know which is worse.
XSS and SQLi can happen independently of the memory safety of your chosen programming language. You can use relatively safe frameworks or ORMs to generate HTML and interact with your DB, but there will sometimes be complex use cases that require you to extend or otherwise not use those safeguards.

Similarly, I imagine that there are cases where someone needs to do complex wood working tasks that involve dangers which are a less obvious than with a table saw.

I agree 100%, but in reality most people work with the language they are presented with.
XSS is a great example of that. On paper a ton of people know exactly what XSS is and does. In practice... simply don't allow user-controlled input to be emitted unescaped, ever. Good luck!

The reason XSS (and CORS) are tricky is because they fundamentally don't work in a world where a website may be spread over a couple different domains. I get a taste of this in my dayjob where we have to manage cookie scoping across a couple different region domains and have several different subdomains for different cookie behaviors. It's easy to be clean on paper up until you need to interface with some piece of software that insists on doing it its own way - for example the Azure excel embedded functionality requires the ID token to be passed in the request body, meaning you have to pull in the request body and parse it in your gateway layer (or delegate that to a microservice)... potentially with multi-GB files being sent in the body as well!

It's super easy on paper to start from greenfield and design something that is sane and clean, bing boom so simple. But once you acquire a couple of these fixed requirements, the cleanliness of the system degrades quite a bit, because that domain uses a format that's not shared by anything else in the system, and it's a bad one, and we can't do anything about it, and now that's a whole separate identity token that has to be managed in parallel.

Anyway, you could say that buffer overflow or use-after-free are kind of an impedence mismatch for memory management/ownership in C. Well, XSS and CORS are an impedence mismatch for domain-based scoping models in a REST-based world. Obviously the correct answer is to simply not write vulnerable systems, but is domain-based scoping making that easier or harder?

Great examples. 1) You have to deal with your own complex systems where it becomes difficult and 2) you have to deal with external complex systems which enforce bad practice on you. One can see how it becomes borderline impossible not to slip once in a while.
Two of those four are things there's no need to make easy to do by mistake, but two popular programming languages choose to do so anyway and they reap the consequences.

Actually the SQL one is arguably in that category too, to a lesser extent. Libraries could, and should, make it obvious how to do parametrized SQL queries in your language. I would guess that for every extra minute of their day a programmer in your language must spend to get the parametrized version to work over just lazy string mangling, you're significantly adding to the resulting vulnerability count because some of them won't bother.

Bonus points if your example code, which people will copy-paste, just uses a fixed query string because it was only an example and surely they'll change that.

I feel there would be some value in SQL client libraries that just flat out ban all literals.

I know it's the nuclear option, but decades of experience has shown that the wider industry just cannot be trusted. People won't ever change[1], so the tools must change to account for that.

[1] Unfortunately, LLMs learned from people... so... sigh.

Our industry is ageist and anti-intellectual. These are the symptoms of those.
While I agree that the software industry suffers from ageism and anti-intellectualism, these vulnerabilities are actually the symptoms of elitism, cargo culting, and traditionalism, which it also suffers from.
Maybe not ageist, but I do think it's easier to get younger people to work slavishly and pay them relatively less (on average, not everywhere pays like Bay area).
It's easy because there has never been a greater backlog of junior candidates trying to break into the industry.
Could be worded as Low barrier to entry and highly compensated.

Kids get into it just by having the tenacity to do whatever it takes to make it chooch. It's all that counts.

Ageist against old people? young people? middle-age people? I see at least these 3 categories are facing age related issues.
"But modern c++ is safe, preventing all those errors is as easy as not making them!..."
Is there some authoritative source for what is considered modern C++ and what is old? Most projects I've seen use a wide mix of C++ features of varying age. If you use some C++23 futures it would not make it modern if you still use C++98 features you not supposed to use.
Originally it refers to what was already possible in C++98, when one leaves behind the legacy ways of coding C with a C++ compiler.

Started with the publishing of "Modern C++ Design" from Andrei Alexandrescu in 2001.

https://en.wikipedia.org/wiki/Modern_C%2B%2B_Design

When ISO C++11 came to be, many re-used the term to mean C++11 or higher.

Given that many keep updating this to mean more modern versions, a well known developer in the community (Tony Van Eerd) has made the joke of that by C++17 time we were in Postmodern C++.

https://www.youtube.com/watch?v=QTLn3goa3A8

No idea what kind of modernism to call C++23, when C++17 was already postmodern, maybe Revivalist C++.

However it basically comes back to Andrei Alexandrescu's original ideas of programming in C++ as its own language, leave the C ways and pitfalls of resource management behind, learn to embrace a modern language for systems programming.

I should also note that there are developers against this philosophy, they advocate that the C++ as understood by CFront is what one should care about, thus Orthodox C++ movement was born.

https://gist.github.com/bkaradzic/2e39896bc7d8c34e042b

> programming in C++ as its own language, leave the C ways

I'm with Kate Gregory on the "Stop teaching C" (actually Kate specifically means in order to then teach C++ but I also think it's probably fine to stop teaching C outright)

But whilst Kate is right in terms of pedagogy, as a larger philosophy this is inadequate. As a language C++ is obviously defective and the explanation is almost invariably "Because C" which only makes sense once you appreciate C++ in terms of C.

The built-in array type in C++ is garbage. Why is it garbage? This is a language with all these powerful features, why doesn't its array type leverage any of them? It's because this is actually the array type from C.

OK, maybe just the array type is trash, that's obviously not good, but it's one defect. How about string literals. Oops. C++ does sort of technically have the string literals you actually wanted, but the syntax for them is weird and you need the standard library not the core language... the ones you get for "Some text" are C's constant strings, an array of bytes with an extra zero byte, and well, the array type sucks.

This carries on, the language doesn't provide real tuples, it doesn't provide a real sum type, its built-in types don't believe in methods but user types do, everywhere there are weird choices which are non-sensical except for the reality that it's what C does.

And then at the end of that, the language isn't actually compatible with C. It's close, a lot of stuff works, and more stuff kinda-sorta works enough that you may be surprised when it fails, but there isn't the sort of robust compatibility you might expect given the enormous sacrifices made for this goal.

I mostly agree with you.

The issue is how "worse is better" culture tends to win, and if the option is between C and C++ for a given scenario, then I definitely take C++.

However if the option pool is widened to more alternatives, then yeah, there should be a sound reason for still pick them for greenfield development, e.g. CUDA, a language toolchain based on LLVM,...

No, there's no such authoritative source - depending on context C++ fans will mix and match what is 'modern'.

It's somewhat similar to the C/C++ split. When it is convenient it's "C/C++" because "you can easily migrate your old C codebase to C++". But in other situations it's "C++", because C is old and more error prone and "we no longer manipulate raw pointers".

“Modern C++” is not necessarily tied to any specific standard, it is more a collection of ideas and philosophies. Although if I had to pick I’d say it really started with C++11.
not authoritative, but the really big c++ change was with c++11 - changes after that have been important, but perhaps more or less transparent to the average c++ user. and compiler support for c++11 is very good.
In fairness, only 2 of those 4 are actually memory-related.
And both have existing tools to find those bugs that people often just don't use.
Since 1979 with the invention of lint by Stephen Johnson at Bell Labs.

https://en.wikipedia.org/wiki/Lint_(software)

Static analysis as a bugfinding tool has proven to be insufficient, especially for large C++ binaries and JS programs. Both languages are nightmares for precise and scalable analysis.

Coverity exists. They've got a great product. But it doesn't solve the problem.

It doesn't solve everything, it solves even less when it isn't used.
In fairness, only C/C++ of all the currently commonly used languages can have half of the 4 top dangerous software weaknesses.
JavaScript routinely has the other half of the top4.
So do C and C++ when used in web or database applications. So they get 4/4
I agree that in principle the neutralization bugs aren't something C++ is necessarily making worse than, say, Python. But it'd be fascinating to see a study to figure out whether C++ programmers make these mistakes more often, or less often, or roughly the same.

An argument for more often: C++ is so complicated, maybe you're too busy with other problems to address the neutralization issue

An argument for less often: C++ teaches you to be careful and check everything to avoid nasty outcomes so that carries over to neutralization

It's somewhat disheartening as a security enthusiast that people only focus on "popular" security bugs and ignore the rest. The other top 21 bug classes aren't as "cool" but they will let me hack your app just the same.
Sure, but SQL Injection will let a script kiddie steal and/or drop your entire poorly configured production DB.
It also provides several paths to RCE depending on the environment, not just exfil.
SQL Injection is weird because it's been known for so long and modern frameworks usually have so many ways of avoiding it by default, that's one has to go out of their way to create an injection vulnerability, but it still happens often with greenfield code.
> mproper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

This is often ignored as it simply takes too much time and it often does not hurt much as it’s ‘internal’ (to the company using the saas or whatever).

Then ask yourself: how much have you done to prevent people choosing the wrong programming language? Because the PL has such a major influence, it's by far the most low hanging fruit to tackle those many of those issues.
Personally? I've done quite a bit here although there's always more. I worked at Google to fund Rust development internally and externally, helped sponsor the work that eventually led to getting Rust adopted in the Linux kernel, and now run a company that's building a new Linux distribution that prioritizes shipping code written in memory safe languages.

https://security.googleblog.com/2021/02/mitigating-memory-sa...

https://www.chainguard.dev/unchained/building-the-first-memo...

Oh awesome! Then I take off my hat. :-)
How many of these rust can solve?

(Not in use rust for everything bandwagon, genuinely curious)

SQL injection and XSS are typically solved at a library/framework level instead of a programming language one, although type systems can help make those frameworks usable and work well.

Either way, they're effectively "solved" from a programmer's perspective if you're willing to adopt modern frameworks instead of string-concatenating HTML or SQL manually.

Judging from my limited experience the first and fourth are either caught by the compiler or at least result in a panic in some cases.

The middle two are out of reach of a typical PL or type system (there are exceptions like Ur, but I don't think it's adopted widely). It's a problem that is typically solved via libraries and Rust is not unique in terms of providing safe libraries around generating SQL or HTML.

2 of the 4 listed.
With a bit of creativity, you can use static typing systems to at least slant the table in your favor with SQL, HTML, and in general, structured text output. It's hard to completely ban string concatenation because you will eventually need it, but you can make it so doing the right thing is easier than the wrong thing.

However, existing libraries for statically-typed languages often don't do the work or apply the creativity and end up roughly as unsafe as the dynamically typed languages.

It's a bit of a pet peeve of mine.

It could, but it will be decades before Rust adoption is where C/C++ is today so in the meantime it would be nice to see some other, more practical and short term solution to these problems. Otherwise I can predict the the top 4 at least 50% for a decade ahead.
Hence why all major OS vendors are embracing designs with hardware memory tagging, that is the last frontier from possible mitigations.
Items 4 and 12 and only in obvious cases.
for 1 scan all the code base and warn any use of strcpy/strncpy/etc and replace them with snprintf, no APIs without length argument shall be allowed.

for 4 the static analyzer should help, and, also set your pointer to NULL immediately after free too(for double free)

Static detection of UAF is grossly incapable of actually protecting real C++ applications. It can find some bugs, sure. But a sound analysis is going to just throw red all over a codebase and get people to disable it immediately.

Changing everything to take lengths is definitely a good change - but challenging to retrofit into existing codebases. Apple has a neat idea for automatically passing lengths along via compilation changes rather than source changes, but if you want to do things in source you have to deal with the fact that there is some function somewhere that takes a void*, increments it locally, reinterpret_casts it to some type, and then accesses one of its fields and you've got a fucking mess of a refactor on your hands.

> top four elements are still

Use after free is actually gaining popularity, up 3 since last year.