Hacker News new | ask | show | jobs
by xg15 665 days ago
Micro libraries are worse than no libraries at all - but I maintain they are still better than gargantuan "frameworks" or everything-but-the-kitching-sink "util"/"commons" packages, where you end up only using a tiny fraction of the functionality but have to deal with the maintenance cost and attack surface of the whole thing.

If you're particularly unlucky, the unused functionality pulls in transitive dependencies of its own - and you end up with libraries in your dependency tree that your code is literally not using at all.

If you're even more unlucky, those "dead code" libraries will install their own event handlers or timers during load or will be picked up by some framework autodiscovery mechanism - and will actually execute some code at runtime, just not any code that provides anything useful to the project. I think an apt name for this would be "undead code". (The examples I have seem were from java frameworks like Spring and from webapps with too many autowired request filters, so I do hope that is no such an issue in JS yet)

2 comments

> but I maintain they are still better than gargantuan "frameworks" or everything-but-the-kitching-sink "util"/"commons" packages, where you end up only using a tiny fraction of the functionality but have to deal with the maintenance cost and attack surface of the whole thing.

Indeed. Several toy projects I've done were blown up in size by four orders of magnitude because of Numpy.

I only want multi-dimensional arrays that support reshaping and basic element-wise arithmetic, maybe matrix multiplication; I'm not even that concerned about performance.

But I have to pay for countless numerical algorithms I've never even heard of provided by decades-old C and/or FORTRAN projects, plus even more higher-math concepts implemented in Python, Numpy's extensive (and fragmented - there's even compiled code for testing that's outside of any test folders) test suite that I'll never run myself, a bunch of backwards-compatibility hacks completely irrelevant to my use case, a python-to-fortran interface wrapper generator, a vendored copy of distutils even in the wheel, over 3MiB of .so files for random number generators, a bunch of C header files...

[Edit: ... and if I distribute an application, my users have to pay for all of that, too. They won't use those pieces either; and the likelihood that they can install my application into a venv that already includes NumPy is pretty low.]

I know it's fashionable to complain about dependency hell, but modularity really is a good thing. By my estimates, the total bandwidth used daily to download copies of NumPy from PyPI is on par with that used to stream the Baby Shark video from YouTube - assuming it's always viewed in 1080p. (Sources: yt-dlp info for file size; History for the Wikipedia article on most popular YouTube videos; pypistats.org for package download counts; the wheel I downloaded.)

Sometimes importing zombie "undead code" libraries can be beneficial!

I just refactored a bunch of python computer vision code that used detectron2 and yolo (both of which indirectly use OpenCV and PyTorch and lots of other stuff), and in the process of cleaning up unused code, I threw out the old imports of the yolo modules that we weren't using any more.

The yololess refactored code, which really didn't have any changes that should measurably affect the speed, ran a mortifying 10% slower, and I could not for the life of me figure out why!

Benchmarking and comparing each version showed that the yololess version was spending a huge amount of time with multiple threads fighting over locks, which the yoloful code wasn't doing.

But I hadn't changed anything relating to threads or locks in the refactoring -- I had just rearranged a few of the deck chairs on the Titanic and removed the unused yolo import, which seemed like a perfectly safe innocuous thing to do.

Finally after questioning all of my implicit assumptions and running some really fundamental sanity checks and reality tests, I discovered that the 10% slow-down in detectron2 was caused by NOT importing the yolo module that we were not actually using.

So I went over the yolo code I was originally importing line by line, and finally ran across a helpfully commented top-level call to fix an obscure performance problem:

https://github.com/ultralytics/yolov5/blob/master/utils/gene...

    cv2.setNumThreads(0)  # prevent OpenCV from multithreading (incompatible with PyTorch DataLoader)
Even though we weren't actually using yolo, just importing it, executing that one line of code fixed a terrible multithreading performance problem with OpenCV and PyTorch DataLoader fighting behind the scenes over locks, even if you never called yolo itself.

So I copied that magical incantation into my own detectron2 initialization function (not as top level code that got executed on import of course), wrote some triumphantly snarky comments to explain why I was doing that, and the performance problems went away!

The regression wasn't yolo's or detectron2's fault per se, just an obscure invisible interaction of other modules they were both using, but yolo shouldn't have been doing anything globally systemic like that immediately when you import it without actually initializing it.

But then I would have never discovered a simple way to speed up detectron2 by 10%!

So if you're using detectron2 without also importing yolo, make sure you set the number of cv2 threads to zero or you'll be wasting a lot of money.

This is mortifying. This should not be acceptable implicit behaviour for imports to implicitly run code by simply existing
I know, right??! I was alternating between hitting my head up against the wall in despair, and jumping for joy I accidentally found a way to speed up detectron2 by 10%. I'll take the win.

YOLO: You Only Load Once

That’s just the way Python works though, an import reads a script line by line defining the functions and executing calls. I think that’s true of any scripting language?
How it works and how it should be used are different.

Say “no” to import side‐effects in Python:

https://news.ycombinator.com/item?id=7536246

https://chrismorgan.info/blog/say-no-to-import-side-effects-...

>I must be able to import your module— any module— at any time without anything breaking.

Or even the beneficial side effect of some other module running 10% faster. ;)

What side-effects, if any, are okay when importing a python module?

https://softwareengineering.stackexchange.com/questions/4540...

>Q: Is this a better design pattern or does it just kick the issue down the road?

>A: Kicking the issue down the road is the basic idea of the "functional core, imperative shell" design pattern.

Functional Core, Imperative Shell (2012):

https://news.ycombinator.com/item?id=34860164

https://www.destroyallsoftware.com/screencasts/catalog/funct...

Well yeah, I wouldn't hold it that way either. The number of PRs I've commented on to take side effects out from imports is like , 75% of all the PRs I've seen in my current job.

Maybe Python 4 can make anything except a function definition or import be a syntax error in a module haha.

C libraries compiled with -ffast-math would like a word…