Hacker News new | ask | show | jobs
by seunosewa 370 days ago
I disagree. Every python package we install seems to install dozens of libraries, each of which can could harbour malware. Many of them are only used for a single function within them. We have no idea of what most of the packages are for. It's a lot.
5 comments

https://en.m.wikipedia.org/wiki/Log4j https://en.m.wikipedia.org/wiki/Npm_left-pad_incident

Languages and domais that have leaned too faar into package managers and small libraries are prone to fragility and security nightmares.

For any "serious" application of critical code; every library used need to be vetted and verifierad to be maintained and secure.

Id much rather deal with a bug in our code than a depricated library or breaking version update.

If we are to use a library outside of standard unix or stdlib within my field, better expect a nighmareish code review and a meeting.

Besides being fun; implementing it ourselves improves our skill level for the future. Something vibe coding itself goes against aswell.

> For any "serious" application of critical code; every library used need to be vetted and verifierad to be maintained and secure.

A project only become serious once legal is breathing down engineering's neck. Before that, it's usually the far west. After, it becomes a security circus trying to patch the technology deficiency (custom registries, complex linting and other analysis tooling,...)

If it's open source, it may be possible to create your own fork to fix issues.
This. We finally have a tool that can learn from all the libraries and abstractions that have to fit everybody's needs (and do so badly because there is no free lunch), and extract just the parts that are actually relevant to our problem and domain. This allows you to not only produce a much smaller attack surface, but also allows for domain specific optimisations and shortcuts.

It's kinda like project specific semantic monomorphization.

Sure. Lot's more debugging than using something battle tested, which is why I have this in my CLAUDE.MD:

> If there is a battle tested, well known package that can help us, then recommend it BEFORE implementing large swaths of custom code.

This is hilarious.
You're right. I didn't fully read what the OP was saying, which is genius; and my response was more towards the article.
I didn't get this either so let me try to explain as ChatGPT did to me:

Monomorphization means taking a generic function and generating a version specific to the type being used, eg a Rust function

  fn identity<T>(x: T) -> T {
      x
  }  
can be compiled into one version for i32 and one for String, which is more efficient since the compiler knows the types:

  fn identity_i32(x: i32) -> i32 { x }
  fn identity_string(x: String) -> String { x }
Semantic monomorphization could mean extracting the parts of the library that are meaningful to generate problem-specific concrete code: instead of importing pandas to do

  import pandas
  df = [{"a": 1}, {"a": 2}]
  total = sum(d["a"] for d in df)
The LLM might skip the import entirely and generate only:

  data = [{"a": 1}, {"a": 2}]
  total = sum(d["a"] for d in data)
If I understood right, the parent found it funny that a comment suggesting we could never use libraries because we can concretize the specific relevant code, would be responded to with a Claude.MD that essentially said "always use libraries instead of concrete relevant code". I missed it because I didn't stop to look up "monomorphization", so I hope this helps anyone else like me get the joke.
Nah, I found it hilarious that any LLM would have any clue what would constitute ‘well baked’ in the context, or that any of this was going to end well.
>This allows you to not only produce a much smaller attack surface

Why does this reduce your attack surface? Can the functions in the library, unrelated to the ones you're using, be triggered by user input somehow?

It's about the functions you _do_ call. Those probably have larger scope beyond your specific use case. Worse they have to support the superposition of various use-cases of their users.

Let's say you got a library to do arbitrary unicode string verification, but your code only ever works with strings of a short bounded length (e.g. 32 byte), an LLM could write you vectored verification instructions for that.

LOL! I thought the article was going to be about reading books and ChatGPT!

And yes, I agree.

https://www.npmjs.com/package/boolean

>converts lots of things to boolean.

>3 million weekly downloads

This is insane.

3 million weekly downloads for a package that is “deprecated” and the source repo no longer exists. Truly insane.
Even if it wasn't deprecated this is literally

    ['yes', 'y', '1'].indexOf(input.toLowerCase()) !== -1
People adding a dependency to avoid writing one line of code...
I guess some people were taught that the Not-Invented-Here Syndrome is a bad thing and needs to be avoided at all costs. And they took it literally.

I had a similar situation at a former job once, where as a subtask of some task we basically needed to implement a "partial deep copy" for a few specific classes (deep-copy some selected properties, shallow-copy the rest), which could either be implemented in ~50 lines of Java code, or by adding an extra library that provided this functionality using a domain-specific language, plus many other things, some of them potential vulnerabilities... and it took a lot of time to convince my team leader that "reinventing the wheel" is the right thing to do in this case.

"Why do you want to program something that already exists? Don't you realize that every line of code you write is a line of code someone else will have to maintain in the future?" Yeah, good points, but it's not like using a library is without costs either: you need to scan for vulnerabilities, increase versions, sometimes the API changes; in long term that is a lot of work to do just to avoid writing 50 lines of code.

This is the total leopards-eating-faces moment from all the greybeards.
Same with ruby, I have to use a package with 230 dependencies.
This sounds exactly like under utilized - if someone needs a function or two from a library I guess making yourself depending on 3rd party for such small gain doesn't make sense.