Hacker News new | ask | show | jobs
by jebarker 114 days ago
Reading this was a good reminder not to be intimidated by assumptions about complexity. (Without giving it much thought) I would have assumed that it would be hard to replace malloc for such fundamental applications as ls, but it's surprisingly simple.
4 comments

There's usually an easy-ish way to override malloc/calloc/realloc/free on Unix, as it's very useful to do when debugging issues or just to collect allocation metrics.

In ELF objects (i.e. on Linux) this is usually done with the "Weak" symbol binding. This is an optional flag for symbols in ELF format that let you override a symbol by providing a competing non-weak symbol, which the linker will prefer when there is a conflict. https://en.wikipedia.org/wiki/Weak_symbol

You can see the list of Weak symbols by looking for a 'W' in the output of `nm` on linux hosts.

Right.

Unfortunately, a lot of system level knowledge like this is not found in a single place but spread over many articles/manuals/books/etc.

However, the book Advanced C and C++ Compiling: An Engineering Guide to Compiling, Linking and Libraries using C and C++ by Milan Stevanovic brings together a lot of information which you might find interesting.

If you started learning from the "bottom-up", you wouldn't think it's intimidating. Fortunately, it's never too late to start learning.
That might be true for this particular thing, but there’ll still be some other perceived barrier of complexity, e.g. maybe it’s hardware, maybe it’s math, maybe it’s some higher level application like graphics. My point was that I was reminded to not just assume something would be hard without looking into it.
This applies to a lot of things unfortunately. There is a cult of just being afraid and scaring other people.

"You can't do it, just use a library.". "Just use this library, everyone uses it.". "Even google uses this library, do you think you are better." etc.

To add another example to this, you will read that memcpy is super mega optimized on libc and you shouldn't do it yourself etc. etc. etc.

But if you just check clickhouse [1] as an example. They implemented it, it is pretty basic and they say it works well in the comments of the code.

Also you can check musl libc code etc. and it is fairly simple.

People still would argue that you used some intrinsic so it isn't portable or you just benchmarked on one case so it won't work well overall.

Well you CAN benchmark as wide as you want inside your project and have a different memcpy code per project. This kind of thing isn't as bad as people make it out to be in my opinion.

Ofc memcpy is just an example here and it applies similarly to memory allocation, io etc.

As a negative note, imo this is one of the major reasons why most software is super crappy now. Everything uses some library -> those libraries change all the time -> more breaking -> more maintenance. Similar chain happens in terms of performance because the person that wrote that library probably doesn't even know how I am using the library.

This is also why people have endless arguments about what library/tool to use while they can be learning more and more things every day.

[1] https://github.com/ClickHouse/ClickHouse/blob/master/base/gl...

100% - the number of times you will need to use a super optimized memcpy() in real life versus the benefit you can get from looking at and writing basic versions of it for different CPU's is very slim.

Then you'll have a much better idea of when to _really_ use one that depends on intrinsics, is optimized etc, and how to benchmark them ... those are the real skills.