| There is a lot in that statement and it deserves challenge. Note two things: there is no universal right answer here, sometimes you are correct, sometimes you are wrong; my responses will be contradictory (and also contradict things I didn't think to write) each project will need to figure out which of the combinatorical points is important their project to come up with the right also. Not doing this is premature pessimization - getting your data structures wrong up front will force slow runtime on you in a way that no micro-optimization (which is what premature optimization is really referring to) can ever fix and getting out will cost a very expensive rewrite. > 1. Such data structures (or more generally, std::vector<std::string, std::vector<std::string>> or something like that) are the natural way to represent e.g. dictionaries. They are natural, and easy but that doesn't mean they are the right way. Often I find with a little thinking that there is a enum key under it all that I can hard code - and the enum key is also type safe so that the compiler can prove my code is correct against some class of bugs in a way a string as key cannot. Why are your keys and values std::string which (baring small string optimization) will allocate more? Often you can place a maximum length on one of both that is small enough that you can replace std::string with a struct containing a char array (or std::string_view - I still have to support C++14 so I haven't been able to try it) and avoid a lot of allocations. Failing that, often the correct data structure is a database (I reach for sqlite first, but there are plenty of options with pros and cons) which is fast lookup, allows for more complex structures easially, it persists the data to disk so that next startup I don't have to spend all the time reconstructing all that data (realistically performance of creating that large data dictionary is of far greater concern that getting rid of it). > 2. This general issue extends to virtually all collections. The idea that you should avoid large collections is not a practical solution to real world problems. This article is about systems programming. In systems programming you rarely have such a large dictionary in that format. You often will in something like the filesystem, but there the point is to have the data persisted from disk and typically you only read the subset you care about. You may have a copy in cache (and cache invalidation becomes an issue), but the in memory portion of data itself is never in a dictionary of that format. > 3. An alternative solution would be lazy destruction, but that comes with its own issues, such as a really bad worst-case memory overhead or making RC sufficiently more complex that it's not really a win over tracing GC anymore [1]. That is one option, there are more. Intentionally leak that whole dictionary. There is no reason to care if all the destructors run for any of the examples you presented and realistically if you are getting rid of the whole thing you are probably in shutdown anyway so who cares if you leak memory? Many systems programs and libraries do this. Move the data to a different thread that only handles reclaiming memory and then don't worry about it. (in discussion we talk about making that thread idle time priority so it won't affect performance - but in practice all schedulers I'm aware of do this poorly in some way) Finally, if after reading all that and consideration all your options (including things I didn't think of!) if you still conclude a large dictionary of strings to strings is your correct answer, then you probably should demand good tracing garbage collector. I'm not against using them where they are appropriate - I'm just arguing that if you think about your data a little more you will discover they are not needed nearly as much as it seems - and this time spending thinking is usually worth it because it results in much faster programs anyway (even if there is one large dictionary in the code how many others did you get rid of?). |
> Often I find with a little thinking that there is a enum key under it all that I can hard code - and the enum key is also type safe so that the compiler can prove my code is correct against some class of bugs in a way a string as key cannot.
There's a fundamental misunderstanding here, I think. By dictionaries I mean language dictionaries, e.g. English -> French. You won't find an underlying "enum key" here. Nor am I sure how an enum would ever get large.
> This article is about systems programming. In systems programming you rarely have such a large dictionary in that format.
First, the article is, but the subthread that started this discussion was more general than that.
Second, even in systems programming, you will commonly have arrays or other collections. Caches are a very common thing in systems programming, after all.
> That is one option, there are more.
It is trivially true that there are alternative options, but all these are workarounds around the problem, i.e. you have to complicate your design or make silent assumptions that may not hold in the future (e.g. when disposing of memory at process exit does no longer work because you now need the code as a library).