Hacker News new | ask | show | jobs
by csande17 1608 days ago
Given the number of times we have failed to learn the lesson "downloading code from untrusted sources and running it is a bad idea" -- the log4j and NPM colors fiascos spring to mind -- I think it's fair to conclude that this industry is completely incapable of learning anything, ever.
3 comments

Ignoring the fact that basing ones opinion on an entire industry based on two "fiascos" seems drastic at best, who can we trust if we suddenly can't trust organizations like Apache? Do you trust the Linux Foundation?

It's almost like the issue is not that code is available, but how people use the code that's available, and no one seemingly likes funding open source code.

Everyone seems to be accepting the premise but I’ll reject it. For-pay software has lots of bad security vulnerabilities too. SolarWinds is an example. Windows and Office exploits. Browser 0-days. Etc etc

It’s almost like software is extremely complex and security is very hard in general. You’re always going to make some kind of trade off.

The problem is that we as humans don’t know how to correctly estimate risks like security risks. That means it’s not priced in when you go and ask “should I incorporate software package X into my build??”. “Should I automatically take updates from my upstream?”. There’s no good answers here either. Ultimately you need to be careful about which dependencies you take on and which ones need to be kept up with and which ones should be pinned (but even in the best case scenario issues will occur)

> and no one seemingly likes funding open source code.

I’m not sure how this meme got started but it’s toxic. Why does free software need funding? Free software needs contributions. Big corporations make contributions by paying engineers. Everyone benefits in this ecosystem.

> I’m not sure how this meme got started

How? Look at multitude of projects and see that most people using the software is not contributing back, with either time, money or anything else.

> Why does free software need funding? Free software needs contributions

You're saying the same thing, "contributions" is one way of funding projects, "funding" doesn't just mean money, it also means contributing engineering hours, security audits or any other way of contributing back.

But without any funding (money, time and/or effort), it's really hard to do security audits for example, since it's expertise many developers don't have nor get to educate themselves about on the job.

How is it toxic to see how little everyone who uses open source/free software is contributing back to the projects they use?

Why do most users need to contribute? The value is the ecosystem. The point is we don’t have to contribute to everything we use. We can build on the work of others and they can build on ours.

Even developers of common libraries are relying on an amount of open code so immense they couldn’t possibly make contributions to all of it. This is the beauty of free software.

> We can build on the work of others and they can build on ours.

This only works if both parties publish. Otherwise it's "we can build on the work of others"

Many companies seem to be able to benefit in various ways from contributing to open source / free software.

Examples: - Chromium and Android obviously benefits Google as it makes it easier to ensure adds get through - Also, they limit the ability of Apple/Microsoft to control those revenue streams in their walled gardens - Hardware and software vendors benefit from making sure Linux works well with their products - Making TensorFlow free helps build a community that in turn makes hiring easier. - Contributing to Torch may protect against a monopoly - Contributing to other R or Python machine learning tools may help limit the power of companies like SAS or IBM/SPSS - Similarly, contributions to Postgres/Mongo etc wrestles power away from Oracle, MS (MSSQL) and IBM (DB2). - More of the same: Proton vs DirectX, OpenCL vs Cuda, FidelityFX vs DLSS. When a competitor tries to establish a standard that is either paid for or limited or proprietary in some other way, providing or contributing to open alternatives may be easier to do than to provide a direct proprietary competitor. - DataBricks founders benefit from being part of the creation of Spark, and can get paid for adding further value.

Many of the above are cases where large to huge corporations use their power to disrupt competitors by providing free alternatives in areas where the has some market dominance. Other contributions assist in delivering a basic product for free while getting paid for products that add value on top.

Every individual in the open source community is receiving more benefit than they could ever personally contribute. Keeping score is pointless. The fact is large corporations use open source software and also contribute to the ecosystem, just like anyone else. This is fine.
‘Free’ literally means that you don’t need to contribute back.

If a contribution is required, then it’s not free.

I find that usually there's just one person (or a very small group) with a vision, motivation, and the skills required to take the project in a good direction and keep maintaining it. Contributions from others tend to be fixes and features of limited scope, essentially drive-by contributions, not enough to keep the project going.

Funding would help ensure that those who have the skills and motivation and vision can keep working on it.

It's also a constraint setting problem. No one deploying software with log4j would've said "yeah, the logging system should be able to reach any IP address at all if asked to by external input.

But we lack a decent way to express that sort of data flow constraint when deploying software.

The NPM colors fiasco was something we should have learned not to allow to repeat -- after the left-pad fiasco. The fact that we keep stepping on rakes and getting smacked in the face like that is the problem here.
I dunno what you mean, unmoderated repositories was the deign goal of NPM.

When NPM launched, and to this day, I was among the people voicing preference for the philosophy that goes into maintaining (e.g.) the Debian repositories. But some people want a package source with no gating mechanisms.

Of course there are many options for how and when to gate that lay somewhere between debian's approach and a fully unmoderated one. But when that case was made, I was informed we were old fogies out of touch with the modern pace of development. So as far as I can tell these "fiascos" as you call them are NPM operating exactly as intended.

I'm saying, we should have learned that was a shitty design goal, and put more stringent checks in place to ensure a single upstream developer can't ratfuck literally everyone's Node app, especially since Node has moved beyond being a startup toy and is now critical IT infrastructure for major corporations.

The Go ecosystem is still fucking clownshoes in so many ways, but even they managed to pivot away from "depend directly on whatever random developers barf onto GitHub". The Node ecosystem, by comparison, evinced all the problem awareness of the "this is fine" dog.

I feel like the actual problem behind this is a useful definition of what "trusted" and "untrusted" mean that does not resolve to assigning blame for problems that have already happened.
I feel like "literally any URL supplied by anyone capable of visiting your website" and "some random guy from the Internet, with no connection to you or your company whatsoever, who was recently arrested for trying to burn his own house down" are both fairly obvious examples of sources from which you should not download and run random code without checking it first.

But maybe that's the part that this industry is incapable of learning.

Not only that, but we're now creating devices that depend on remote servers. It's completely obvious what the downsides to this are, yet we embrace it without question.

And why are moving to apps that only work online? Networks are slow compared to desktops. Programming seems to be the art of doing the worst thing possible. Our computers are getting faster and faster, but we're relying on communications that are vastly slower.

And another thing: Windows updates, or even Firefox for that matter. Software shouldn't be so full of problems that you need to constantly update it. Just update it every 2 years. Sure, you get the supposed latest and greatest, but updates are a messy process.

Perhaps the difference between programming and "real" engineering is this: parsimony. In engineering, you have to do more with less. In programming, the attitude seems to be to shovel more spaghetti onto the plate.

Update: I'll add a further point. It's not just parsimony, it's also cost of errors. In physical products, a flawed design that makes it into production is costly. So you have to get it right. With software, you can afford a slap-dash approach. And that's what we see.

> Not only that, but we're now creating devices that depend on remote servers. It's completely obvious what the downsides to this are, yet we embrace it without question.

But there are upsides as well: the devices are usually attached to services for which the device acts as a conduit. If the service is valuable, you can sell devices and keep collecting money after first sale, driving huge margins. Didn't Hackernews post a lot of Fs in the chat for the original BlackBerry service -- one of the first devices of this kind to reach a mass audience -- once it was shut down?

I think it's preposterous to expect that software that has millions of line of code (like Windows or Firefox) is perfect. Software Engineers are humans as well and we do make mistakes.

You could argue that we should adopt the same testing strategy of mission critical pieces of logics where 1 LOC has 1+ LOCs of test code, aim for 100% coverage, ... . But then your windows license would cost 1000s of $.

And about your final remark. What you call real engineering has to cope with well defined requirements and the context is well known. You know beforehand the maximum weight a bridge should be able to support. You know the configuration and properties of the soil where the bridge will be built, and you have good estimates of 1 in 100 year extreme events magnitude.

Software doesn't have that luxury. If we build a web server to handle 1k qps it still is somewhat likely that said server might face spikes of 10k qps. Try to do that with the bridge mentioned above. Forget failing gracefully. Additionally users have few ways to use the bridge. You either cross it by car / bus / on foot. The same web server might face someone trying to send a payload of GBs where you would expect few KBs. That's at least partly why code is messy. The space of possibilities is much greater and we somehow need to write software that still works.

> The space of possibilities is much greater and we somehow need to write software that still works.

But the cost of failure is much lower, and that's why we as an industry can be so mediocre.

Physical engineering organizations have profit margins in the single digits to low tens. Software has margins in the 50s to 80s and marginal unit costs of zero. If any physical engineering organization had employees of the skill of our current SaaS software market, they'd be out of business immediately because no other industry but software can absorb such frequent and dramatic failures.