Hacker News new | ask | show | jobs
by cvccvroomvroom 1474 days ago
HeartBleed, Spring4Shell, LogJam, Struts, Jetty and many more beg to differ.

More LoC is always a greater attack surface, regardless of development trustworthiness.

Minimize code ruthlessly.

4 comments

The corollary to "Minimize code ruthlessly" is "Roll your own X". Sure Heartbleed was bad, but are you going to write your own TLS library?

IMO, the OP post has an unfounded sense of hubris. Everyone else's code is bad except for me, who only writes minimal code with no exploits.

Libraries are generally larger because they are usually written for more use-cases. If your program is only referencing a small portion of a library, then rolling your own doesn't mean rewriting the library. Your code will be more minimal because it's only written for your program.

Assuming you're fallible and write code with exploits like everyone else, your program will probably have different exploits from mine. An attacker won't be able to target a widely used library and enjoy being able to pick from several targets. Instead, they'll have to tailor an attack for your code base. Each exploit found yields less of a reward, which will frustrate less motivated attackers.

This comment inflates the upside risk (risk that every developer reinventing the wheel creates lower chance of hack of their code succeeding) and discounts the downside risk (the increased costs associated with elimination of specialization/consolidation of code into reusable libraries).

> Instead, they'll have to tailor an attack for your code base.

This misstates the actual threat.

Assume detection of exploits is automated (even if it isn’t completely automated now). Assume exploit code is modularized and passed around on hacker markets, because they are.

Your recommendation only increases the cost to exploit by a small factor, but it also increases the inevitable costs of legitimate usage by possibly a much larger factor (patching the inevitable bugs which are found). Because developers aren’t specializing (in how to efficiently+safely write logging code) bugs both in the designs phase and implementation phase will be higher than in a counterfactual specialization world.

This is just security through obscurity. It's no different than skipping password-hashing because you think you are too small be attacked (after all you can minimize one more dependency). It works up until the point your business is actually worth anything.

Libraries are not only larger because they are written for more use-cases, but also because they cover more tested edge cases.

> The corollary to "Minimize code ruthlessly" is "Roll your own X". Sure Heartbleed was bad, but are you going to write your own TLS library?

But using "don't roll your own crypto" doesn't generalize to using every possible library to avoid writing any code. Down that path lies the madness of leftpad.

Crypto and security protocol implementations are exceptionally difficult to write in a way that avoids all exploits, so, yeah, don't write your own TLS library unless you're willing to fund a team of all the necessary experts which is going to be very difficult.

OTOH a very large number of libraries implement simple things in overly complex ways (sometimes by necessity because a library needs to try to be all things for all people) and can be very easily replaced by small amounts of code that do only what my product needs.

Every third-party library you add is an additional source of bugs, unmaged surprises (e.g. they decide to break their APIs for the lulz), attack surface and constraints. It is wise to weight the benefits against the drawbacks for each library individually before making it part of your code.

My counter-point is twofold. Everyone understand not to roll your own crypto not because developers intrinsically know crypto is hard but because of sustained marketing by cryptographers. It's an easy counter to point to when people swing too far into NIH. Following that I think a lot of people don't realize when they are in over their heads; it happens a lot in software estimation. It's easy to look at something like logging and say "I can just write printf, why does logging-library-x have so much bloat" without realizing how complex a good logger can get. Software architecture is judged by how easy it is to maintain over time; what is small amounts of code today can easily balloon into something unmanageable.

At the same time, there's a lot of hubris that goes into the statement "Every third-party library you add is an additional source of bugs, unmaged surprises", but somehow the code you write was typed by God himself.

Everyone points to leftpad madness of dependencies despite the fact that the dominos leading up to leftpad was the work of one very dedicated individual. Despite other languages having the same kind of simple dependency management never having anywhere close to the same issue (like Ruby, on which npm was based). Regardless, the OP doesn't use leftpad as an example; he uses log4j. I agree that log4j had insane defaults, and that's on the maintainers, but the implication is that you should roll your own logging facade and that's something where I wouldn't want to work on a project where all the logging is done through printf (except in probably an embedded environment) or maintain an in-house logging library because the original writer thought it would be "just small amounts of code".

> At the same time, there's a lot of hubris that goes into the statement "Every third-party library you add is an additional source of bugs, unmaged surprises", but somehow the code you write was typed by God himself.

You (everyone) should't allow hubris to drive these decisions, instead make an objective evaluation.

If you're going in saying "I can write this same functionality better than anyone else so I'll just do it", stop and rethink.

Every line of code (to an approximation) adds more potential bugs. So if you pull in a library with a million lines of code, you've added that level of magnitude for possibly future bugs. If you actually need most of those million lines because you use all the features, then you need them. In that case it would be hubris to assume that you can write the same but slightly different million lines and do much better. You can't.

But as other subthreads here point out, often you really use like a thousand lines out of that million line library. If that's you use case, you can reduce your attack surface by 999000 lines of code by writing a few functions of your own, customized to your needs without any unused frills. You don't need a god in your staff, just someone who values the reliability and security that simplicity brings.

> unmanaged surprises

To unpack this statement, I meant that when using a third party library you are now subject to their schedule and roadmap over which you don't have any say or ability to manage it. They might drop a feature you need, or break APIs in a minor release, etc. This is undeniably true, so must be taken into account when doing a risk analysis on depending on some library. Well-run libraries offer a stable platform but tons of libraries are not well-run from a release engineering aspect. Choose carefully and know what you're getting into.

"Don't use OpenSSL unless you absolutely have to. Preferably, disable every other feature you don't need. And keep all of your vendored dependencies up-to-date."
He's not comparing quality, as much as quantity. Building a small utility function, general enough for your intended use, brings in less code than adding a library dependency. That's the argument.

TLS is an example of when you probably would not make that trade-off, but there are lots of other examples where it would make sense.

Sure, but the parent post is about logging - which isn't a small utility function. `printf` works for scripts up until it doesn't. When you start needing structured logging, log levels and you are debugging performance issues because of logging you start to understand why that complexity is put in place. Leftpad? Sure that's extraneous. But logging?
Yes, of course. Unless you absolutely have known working pristine code free of side-channels for SHA3, ChaCha20-Poly1305, etc. don't even think about it.

The same goes for parsing and other complicated operations containing pitfalls.

Honestly, I hope most crypto, parsing, numerics libs, and OS kernels and drivetd were rewritten in Rust and given C, C++, etc. interfaces.

The intended way to look at this is, “When I write code, I know what I wrote and the processes in my team (should) ensure other teammates know it as well”.
But what’s the back pressure to this advice? If I need to multiply matrices should I write my own to avoid including MKL BLAS? What’s the heuristic that determines when a dependency is worthwhile?
I have written a matrix multiplication lib once because I had some a priori about the data and could remove half the operations.

Everything has a cost (usually in time) and everything comes with trade offs (usually a different set of bugs). Not-invented-here comes with a gigantic upfront time cost and a larger set of bugs. The best heuristic depends on what you’re optimizing for. There’s never an easy answer.

As usual, it depends. A naive matrix multiplication function is pretty simple. If that's all you need and can do with sub-optimal performance, I'd write my own.
It’s less about lines of code and more about scope of functionality.

> Minimize code ruthlessly.

Minimize functionality ruthlessly.

> More LoC is always a greater attack surface

More… than what? What does the counter factual look like?

If I only care about 1 application in a vacuum, reducing LoC is not terribly difficult. If I run my application on any modern OS, I depend on thousands of applications, daemons, libraries, and a kernel. I would far rather their developers take reasonable efforts to import common libraries when appropriate. The aggregate LoC of an ecosystem is more important than the LoC of a single application.

Also, telling people the metric of relevance is LoC is wrong and will lead people to game the metric, losing sight of the actual goal of code quality. There are infamous examples of Perl code golf; they optimize for LoC, but aren’t at all useful for code quality or security.

Far better to expand LoC a reasonable amount in favor of developer readability and to reduce complexity.

I'll say the quiet part out loud: nobody got fired for using a dependency with any of these bugs, and nobody got promoted for not doing so.

Obviously, pulling in something like left-pad is worthy of derision. But generally you should pull in whatever dependencies let you go faster, and minimize the amount of time spent planning for black swan events.