|
|
|
|
|
by hansvm
698 days ago
|
|
Yeah, so with capabilities, the naive idea of "categories" might or might not make sense. It depends how much duplication you get (how many pieces of equipment have large counts of capabilities). Most real-world scenarios I've seen only have 2-10 capabilities per piece of equipment though, which makes the simplest, dumbest solution work in the real world (anything fancier pushes closer to the database literature, and there are good solutions, but you hopefully don't have to care). > tens of thousands of tasks Am I misinterpreting, or are you saying that you have tens of thousands of queries and a much lower equipment count? That enables all kinds of solutions. Even on a single machine, multi-thread it. For deletions and insertions, use something like bounded-wait hazard pointers [0] with a bias toward cheap reads. If you have concurrent tasks, horizontal scaling is the name of the game. 10 microseconds is an easy baseline for intra-datacenter networking, easily allowing 10 millisecond reads and mutations even with a distributed solution. Related, simply caching results might get you the performance you want on a single core if you only have hundreds of pieces of equipment you're searching each time. For each capability, just record the last time you mutated it. You can re-use any query result whose associated results don't have potential mutations. Especially if writes are much less frequent than reads, and especially if you have a bias toward not handing out equipment with all possible capabilities when something cheaper would suffice, most of the time you never have to hit the raw data structure, allowing a single core to handle a much higher query load even with worse data structures. [0] a decent walkthrough: https://pvk.ca/Blog/2020/07/07/flatter-wait-free-hazard-poin... |
|
I have been experimenting with a few caching approaches, but the caches tend to be invalidated so often that they didn't seem to help much yet. Some of the coarser-grained caches mentioned in some of the other comments could be interesting though, as a way to quickly skip over entire regions when the remaining availability intervals are too short for typical queries.