Hacker News new | ask | show | jobs
by sumanthvepa 3249 days ago
The problem with asking a programmer to keep track of the locality of their data, is that most modern programming languages make reasoning about locality hard to do. With the exception of C and C++. Even for those languages, unless all relevant data is in simple arrays, making assertions about locality is hard.

For interpreted languages like Python or Javascript, figuring out RAM storage and access patterns of data is very hard. So we probably need programming language mechanisms to help with understanding the locality patterns of our programs and probably tooling to help change it.

3 comments

Even for C++, most designs are OOP, which treats data layout as an afterthought.
> Even for those languages, unless all relevant data is in simple arrays, making assertions about locality is hard.

It depends on the libraries you use. Take a look: https://github.com/Const-me/CollectionMicrobench

As you see, in practice, a good linked list is same or slightly faster than std::vector. And it’s consistently 2-3 times faster than equally linked std::list.

That’s not just synthetic tests. Recently, I’ve got 2.5x performance improvement in my app just by switching from std::unordered_map to CAtlMap with the same keys/values.

Theoretically, C++/11 fixes that with stateful allocators. Practically, I’ve not seen good open source ones with the performance comparable to CAtlPlex that powers these ATL node-based collections. I’m not even sure it’s possible. STL is too standardized and too old. It might be there’s no room in its allocators API for sufficient level of integration between a collection and it’s backing stateful allocator.

Lots of developers are blind to issues of local reasoning. It becomes a self fulfilling prophecy, because the code they write is often inscrutable. The people who generally can follow things like this can't anymore.