Hacker News new | ask | show | jobs
by zahlman 271 days ago
Why can't it deduplicate matching licenses?
5 comments

That's how it is done in debian packages. The full text of each license is only mentioned once and given an identifier which is then used to link the license to the relevant copyright statements.

For example: https://salsa.debian.org/debian/highlight/-/blob/94ee6559155...

The legal department doesn't want to take that chance.
Lawyers can make mistakes, but to REALLY mess things up, you need lawyers, plus some engineers that take the lawyers too seriously.
The worst companies to work for are bad at differentiating risk especially ones that entertain the most remote legal risks. It seems to happen more with legal risks than security or technology risks.
That goes true of basically every hard core expert. They might be wildly smart in their domain… and that is it.
I think it might be the case that licenses often include the authors’ names in the “this code is copyright of so-and-so” (as you can see, I Am Not A Lawyer) section, which might be considered part of the text of the license, thereby making it a requirement to include the full license text for each dependency.
It’s usually done in MIT-like licenses, which are quite short.

But I’d argue that replacing it with

    Copyright (c) 207X Jonathan Fenimore
    Licensed MIT, see the license text below
or even

    Copyright (c) 207X Jonathan Fenimore
    SPDX-License-Identifier: MIT
should be enough, but IANAL too.

---

In longer licenses like GPL or Apache, you are not supposed to change any copyright statement placeholders. For example, there’s this line in the GPL text:

    Copyright (C) <year>  <name of author>
But it’s a part of the “How to Apply These Terms to Your New Programs” section. You are supposed to copy it into your code and fill it out there instead.

---

Or they could just compress the license amalgamation! I think it would be a bit bigger but pretty reasonable, and their lawyers should be happy with this arrangement.

Are you sure it doesn't*?

* When we treat different versions of say, the MIT license, with different names and copyright years inserted, as different licenses.

I have to imagine the file would compress extremely well though... I'm more curious why they don't use compression.

Not sure why Apple doesn't offer a compressed filesystem :p it makes writes a bit slower when compression fails, but otherwise the savings in I/O time often makes up for the increased processing on read and write.
I imagine it does precisely that when gzipped for distribution.