| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dmytroi 1822 days ago

My 5 cents would be wrong abstraction and wrong tooling, let me elaborate a bit of them:

- Current abstractions are mostly based on POSIX threads and C++/Java memory models. I think they are poorly representing what is actually happening in hardware. For example Acquire barrier in C++ makes it really hard to understand that in hardware it equals to a flush of invalidation queue of the core, to sanity check your understanding try answering the question "do you need memory barriers if you have multithreaded application (2+ threads) running a lock-free algorithm on a single core?", correct answer is no, because same core always sees it's own writes as they would happen in program order, even if OOO pipeline would reorder them. Or threads, they seem to be an entity that can either run or stop, but in hardware there are no threads, CPU just jumps to a different point in memory (albeit through rings 3->0->3). Heck even whole memory allocation story, we have generations of developers thinking about memory allocation and it's safety, yet hardware doesn't have that concept at all, memory range mapping concept would be the closest to what MMU actually does. Hence the impedance mismatch between hardware and current low level abstractions created a lot of developers who "know" how all of this works but doesn't actually know, and a bit afraid to crush through layers. I want more engineers to not be afraid and be comfortable with all low level bits even if they would never touch them, because one day you will find a bug like broken compare-exchange implementation in LLVM or similar.

- Tooling is way off, the main problem with multithreading is that it's all "in runtime" and dynamic, for example if I'm making a lockfree hashmap, the only way for me to get into the edgecases of my algorithm (like two threads trying to acquire same token or something) is to run a bruteforce test between multiple threads and wait until it actually happens. Bruteforce-test-development scales very poorly, and testing something like consensus algorithms for hundreds of threads is just a nightmare of complexity of test fixtures involved. Then you get into ok, so how much testing is enough? How do you measure coverage? Lines of code? Branches? Threads-per-line? When are you sure that your algorithm is correct? Don't get me wrong, I've seen it multiple times, simple 100 lines of code passing tons of reviews only for me to find a race condition (algorithmical one) half a year later, and now it's deployed everywhere and very costly to fix. Another way would be to skip all of that and start modeling your algorithms first, TLA+ is one of the better tools for that out there, prove that your model is correct, and then implement it. Using something like TLA+ can make your multithreading coding a breeze in any language.

And probably absence of transactional memory also contributes greatly, passing 8 or 16 bytes around atomically is easy, but try 24 or 32? Now you need to build out an insanely complicated algorithm that involves a lot of mathematics just to prove that it's correct.