Hacker News new | ask | show | jobs
by pcwalton 3780 days ago
A lot of apps (old-timey Windows apps, for example) have this philosophy, leading them to reinvent things like crypto and image decoding. Naturally, this leads to tons of bugs, including security bugs.

I would revise this to: Don't bring in more code than you need. But if the choice is between writing something yourself and using someone else's well-tested, heavily-used library, always go for the latter.

8 comments

As an architect, you need to be able to do a cost/benefit analysis of each option. That is what software architects do, why they have experience. For example:

  How much time will it take to implement each option?
  How much time will it take in the future to support it?
  What security risk does each option incur?
  What is the risk of the project being abandoned?
  What is the risk of the project changing in non-backwards compatible ways?
  What are the performance characteristics of each option?
NIH is a disease, but so is import-mania. With experience, you can make a good decision.
Also de-duplicating dependencies is pretty big. For instance, in Java land, I have a project built in Camel, and Camel is deeply in love with the Jackson JSON parser. So I use it there.

On the other hand, I've been learning to use GSON's parser in my Sponge plugin (a Minecraft server) because the SpongAPI dependency pulls that in anyway.

Albeit, both libraries are dead simple to use so it's a bit contrived, but I see a lot of projects that would pull in Spring's RESTTemplate into a Camel project when they've already pulled in CXF or have Apache's HTTPClient readily available via other dependencies.

(And no, URLConnection is terrible. TERRIBLE.)

One thing I've gotten into the habit of doing is looking around the commit history and issue list for any package I import. Was it something somebody wrote in a hurry and hasn't really touched since? Is it something that has a solid set of regular contributors? Are there a lot of outstanding issues relative to how heavily used it is?

I also spend more time actually reading through specs to see how well they exercise the code.

That's probably standard procedure for a lot of people, but it's something that I had to learn to always do.

That's a good idea
Good engineering managers and architects are great at balancing both biases.
Balance is good. Unfortunately ideologues have taken over software as they have taken over politics.
Are there really people out there whose decisionmaking process goes like "well we don't really need this library, but I'm gonna depend on it anyway to further the ideology of our movement"
Ideologies are rarely presented as such. I have seen places where they wrote databases from scratch, and rewrote Windows scrolling bars. Always 100 good reasons why using something else wasn't good enough. Other times I've seen people always want to buy the shiny nice toy rather than write a few lines of code themselves.
I know plenty of people who want to port everything to JavaScript. When you ask "Why?" they don't know. That's an ideology.
In what sense are technical preferences you cannot explain or justify an "ideology"? Ideology is not unjustified/mistaken belief. Maybe you mean cargo culting in technology, of which there is sadly a lot? Or maybe even "let's do this in JavaScript, because it's the only tech I know and I can't be bothered to learn something new"? That happens, but it's not ideological.
"But if the choice is between writing something yourself and using someone else's well-tested, heavily-used library, always go for the latter."

Absolutely. However, there are plenty of situations where what you pull down from npm or rubygems isn't actually all that well-written or well-tested.

When I first started programming I kind of had this impression that if an open source library is published on a package repo and people are using it then it must be much better than anything I could write. I have learned the hard way over the years that is not always true.

I feel like I relearn this lesson at least once a month.
There's an important kind of compromise that isn't discussed as often: using an external library but behind an interface of your own design, an "anti corruption layer". If the external dependency is limited to one small bridge in your application, then it's so much easier to see what parts of the dependency you actually depend on, to upgrade the dependency when its API inevitably changes, to replace it entirely if it becomes a burden.
I'll link to the facade pattern, since last time I mentioned it people went a week or so before realizing it wasn't a term of my own invention:

https://en.wikipedia.org/wiki/Facade_pattern

The downside is that you bring in another layer of indirection and, as a result, greater cognitive load for yourself.

Mycode -> facade -> library

Navigating code becomes more cumbersome and stack traces longer. I do like this approach too but it isn't free.

That reminds me of another thing I'd like to tell library writers: please think of your stack traces as somewhat public, because I always end up 40 layers deep trying to debug something and it's just painful.

Hopefully a facade will introduce a small constant number of stack frames...

Anyway, with something like Lodash for JavaScript, I'd even want to "facade" that into a project-specific utilities thing. Right now we're on an old major version of it and upgrading would require changing hundreds of locations. When very many files in a project mention the same external dependency, that seems like a recipe for future sadness.

A facade should only add one layer of indirection.

As for project-level management of external dependencies the tooling can be used to provide a facade for imports. I'm not sure about Webpack but JSPM already does this using a config.js file that maps all of the dependencies to readable import names, sans version numbers so the site doesn't break on future updates.

Ideally, once ES6 modules are used more widely it would be great to see libs start to adopt the facade pattern to provide finer granularity of control without deep linking into a project's source.

>That reminds me of another thing I'd like to tell library writers: please think of your stack traces as somewhat public, because I always end up 40 layers deep trying to debug something and it's just painful.

Unless you're using Spring, of course, then all bets are off. That's the biggest downside of IOC containers, they tend to ruin the usefulness of stack traces and the "step in/step out" functions of the debugger.

There was a code base where people wrapped glibc-functions. Most of the time it was straight calls to the corresponding functions in glibc, but they were called x... instead, so malloc became xmalloc, free became xfree, &c.

At one time there was a lot of zombie processes lingering for a long-ish time until the parent terminated and the zombies were reaped by init. I didn't bother to look at the implementation for xpopen, as I assumed it was just a call to popen. Turned out it wasnt; it was fork/exec with a socketpair turned into a FILE* with fdopen. The child was not waited for in xpclose.

I think there can be times when the facade pattern makes sense. I think there can be times when importing the world makes sense. I think there can be times when the opposite is true too. I think talking about these things in an abstract way can miss the point of the very insanity in some concrete solutions out there.

Heh. Well, yeah, abstract opinions are always suspicious. In your case, making a facade around standard POSIX functions does seem weird, especially if the facade is itself buggy! For something that needs to be portable across many platforms, such a facade could be very useful though.
Hah I stopped worrying about stack trace length when I started writing Scala ;)

Usually only 1 or 2 lines of the trace matter, you learn to skip the rest pretty fast.

Here's an interesting story from the Java world - "Filtering the Stack Trace From Hell":

https://dzone.com/articles/spring-vs-java-ee-the-real-story-...

An even more breathtaking stack trace image can be found here:

https://ptrthomas.files.wordpress.com/2006/06/jtrac-callstac...

And the PDF version:

https://ptrthomas.files.wordpress.com/2006/06/jtrac-callstac...

While I completely agree with the sentiment[1], there is a bit of hyperbole (and/or literary license) in the suggestion to "Kill Your Dependencies". Modular, well-contained code is very good. It's usually a good idea to build on other people's work, though this is yet another trade-off decision that will always be part of the software design process.

A lot of the dependencies discussed in this blog post are libraries that aren't actually adding anything useful: the various JSON parsing libraries that should be replaced with the parser in stdlib, or rspec testing libraries that shouldn't have ever been a regular runtime dependency.

[1] Managing complexity and dependencies is probably the most important concern going into the future, not just in programming, but also in every other complex system.

> Don't bring in more code than you need.

I see it as a sliding scale. If I'm parsing 1 string with the same date format into 1 object, I'm not going to pull in some general purpose time parsing library - I'll write the 10 lines of code myself, a few unit tests, and be happy.

If in the future I start having to deal with different date strings and some need to do more than just throw up a single date on a page somewhere, I'll get a date/time library.

You could also use the third party library for your unit tests.
This is pretty much what Rob Pike advocates in Go: "A little copying is better than a little dependency." http://go-proverbs.github.io/
I pretty squarely disagree with Rob Pike on that one. Copying is how you introduce bugs and insulate yourself from upstream bugfixes. I'm suggesting that you should try to remove code first, add a dependency on well-trusted code if that doesn't work, and only copy/reinvent as a last resort.

  > I pretty squarely disagree with Rob Pike on that one.
TBH that's kind of an indication that you should rethink your position. Rob Pike has a lot of experience, he's seen a lot, and he knows what he's doing. I'm not saying you're wrong, just that you shouldn't snap to the defensive position.
Rob Pike's authority does not counter the negative experience I've had with vendoring dependencies, making local changes, and drifting so far from upstream that it becomes extremely difficult to merge in bugfixes. Or the positive experience I've had with package managers making it easy to bring in third-party code, easier than copying it in.
little dependency is the keyword.
Little dependencies are fine. In fact, they're usually preferable: it leads to less code going unused.
> Rob Pike has a lot of experience, he's seen a lot, and he knows what he's doing.

So does pcwalton. Ever heard of https://www.rust-lang.org/ ?

It doesn't matter. If I were Einstein, and von Neumann disagreed with me, I would take that seriously, even though I were a super-genius etc.
I do take it seriously. I'm no super-programmer. Rob Pike is a better programmer than I am.

I just disagree with him on this one thing.

For me, that means I listen carefully for wisdom but have to speak louder when countering their occasional bullshit. Copying is considered a code smell by the likes of Fowler due to the problems it leads to. Also leads to bloat and performance issues. Better to cleanly, simply package up reusable solutions to problems like JSON or protocols (eg HTTP) then just keep importing the same one. You get lean apps plus a greater understanding of what they're doing.

And so does the person you hire down the line to extend that app. Never forget that part when talking copying and tweaking code. :)

An appeal to authority? To Patrick Walton?
That may true for Go, a language which still does not have a great dependency management story. I'm not sure this proverb is universal though.
Nice list. He clearly doesn't like reflection:

  > Clear is better than clever.
  > Reflection is never clear.
Of course, everything has its place, even reflection.
Reflection is a sign that your system is inadequate. In some (maybe even all) languages it may be necessary, but for a language designer it is a failure.
Eh, I am using reflection in a small (C#) project at work - I've had to implement a unit test system of sorts (yes, NIH, reinventing the wheel etc etc) and reflection lets me find all methods that return a particular type very easily. I agree it can become spaghetti very quickly but it is quite useful at times.
The funny thing being that the Golang stdlib uses reflection.
So many libraries in the wild aren't "well-tested, heavily used", though, and sometimes it is really hard to differentiate between popular and good.

At the end of the day, the only person who is responsible for the quality of your proje t is you. You have to figure out which parts of your project are key to your operation and which are just window dressing. If your job is to make a blogging platform, I would expect many features of formatting documents to be reimplementations of other people's work, because you have to know you are relying on yourself for your core purpose.

I also don't understand the heartburn other developers have over knowing "OMG L, THERE IS SOMEONE ON TEH INTARWEBS AND THEY ARE REINVENTING THE WHEEL". It smacks of fear, a fear that is ultimately rooted in insecurity. If a person was secure in their knowledge of their skills, their ability to understand problems and fix them, then there should be nothing to fear from a dozen or a million different libraries doing the same thing and running into one or two of them on one's next project. It is just a matter of course.

> If a person was secure in their knowledge of their skills, their ability to understand problems and fix them, then there should be nothing to fear from a dozen or a million different libraries doing the same thing and running into one or two of them on one's next project.

I actually do fear a million reimplementations of, say, RSA.

The impact of bad software is vastly overrated. Almost all of it is bad already, and yet we haven't killed ourselves off as a species yet.
The keyword is "well-tested, heavily used library'. Especially in the web area I see a lot of imported libraries from github or wherever for doing simple things. In the end you end up with dozens of dependencies you don't know how and when to update.

My rule of thumb is if the stuff we need can be reduced to a few functions, it's better to copy so you at least know which code you are using and don't have thousands of lines of code in you repo where you don't know if they are ever being used,

Copying doesn't make anything better; it just insulates you from upstream bug fixes. If copying is better than adding a line to your Gemfile or whatever, then that's a usability bug in your package manager. The entire reason for package managers' existence is to provide an easier-to-use, more reliable alternative to copying.
Copying gives you stability. I work in a regulated environment so every change has to be scrutinized. Updating a library like jQuery is a big deal. If you use 10 libraries of decent size you will most likely rarely update them because it's too much work and risk.

By isolating out parts you actually need you have a better chance of making updates with reasonable effort.

Not saying this is for everybody but a lot of dependencies can be a killer.

A good package manager that supports lockfiles addresses this issue.
Upstream updates can add bugs just as easily as bug fixes.
That's an argument for having an effective review and testing process. Change is inevitable and it's better to be good at doing it routinely than putting it off until an emergency.
This is overhead for every update of every library. In theory it's a great idea and expensive idea, so of course nobody does it.

There are two mindsets in coding, this code needs to work right now and this code needs to work in 20 years. Linking code is very likely to break in the second time frame. Public API's are generally unstable, services goes away, and people break things. But, if all you need is a toy demo then feel free.

I'm talking about integration tests, not 100% coverage of someone else's code. If you need e.g. image decoding, you need to be able to update libjpeg, etc. ASAP after a security patch – and that only requires a simple integration test covering known input / output for the subset of features you support. Since it's automated, there's very little difference between multiple small releases and infrequent large ones from this perspective.

As for your second point, I think you're overly focused on the wrong area. Both linked and static code demonstrably have many problems over that time period – if you recompile, you have to maintain an entire toolchain and every dependency over a long period; if you don't, you're almost certainly going to need to deal with changing system APIs, hardware, etc. — linking doesn't do a thing to make a 20-year old Mac app harder to run. In both cases, emulation starts to look quite appealing – IBM has, what, half a century with that approach? – and once you're doing that the linker is a minor bit of historical truvia.

Thought they also eased deployment and updates.