| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by grovesNL 698 days ago

All great points, thank you! I'll need to think about this some more.

For context, queries are trying to find the next available equipment for tens of thousands of tasks, so hundreds of thousands of queries isn't too bad relative the to the amount of tasks. Equipment candidates are matched based on their capabilities (some equipment might support A or B, while others support B or C) and tasks might require certain combinations. This might complicate the multiset slightly but it seems like a good idea.

The ordered map I'm currently using handles adjacent available intervals by automatically coalescing them together, so I'm definitely taking advantage of that.

1 comments

hansvm 698 days ago

Yeah, so with capabilities, the naive idea of "categories" might or might not make sense. It depends how much duplication you get (how many pieces of equipment have large counts of capabilities). Most real-world scenarios I've seen only have 2-10 capabilities per piece of equipment though, which makes the simplest, dumbest solution work in the real world (anything fancier pushes closer to the database literature, and there are good solutions, but you hopefully don't have to care).

> tens of thousands of tasks

Am I misinterpreting, or are you saying that you have tens of thousands of queries and a much lower equipment count? That enables all kinds of solutions. Even on a single machine, multi-thread it. For deletions and insertions, use something like bounded-wait hazard pointers [0] with a bias toward cheap reads. If you have concurrent tasks, horizontal scaling is the name of the game. 10 microseconds is an easy baseline for intra-datacenter networking, easily allowing 10 millisecond reads and mutations even with a distributed solution.

Related, simply caching results might get you the performance you want on a single core if you only have hundreds of pieces of equipment you're searching each time. For each capability, just record the last time you mutated it. You can re-use any query result whose associated results don't have potential mutations. Especially if writes are much less frequent than reads, and especially if you have a bias toward not handing out equipment with all possible capabilities when something cheaper would suffice, most of the time you never have to hit the raw data structure, allowing a single core to handle a much higher query load even with worse data structures.

[0] a decent walkthrough: https://pvk.ca/Blog/2020/07/07/flatter-wait-free-hazard-poin...

link

grovesNL 698 days ago

Multi-threading would be really nice, although it's not clear to me how to parallelize this in a way where the synchronization cost wouldn't regress performance. Tasks are prioritized, so each booking happens in sequence (i.e., each task needs to know where the higher priority tasks ended up being placed). There are cases where it's possible to determine whether they might be fully isolated (e.g., if one booking couldn't possibly affect another) to run those in parallel, but sometimes determining that might be non-trivial too.

I have been experimenting with a few caching approaches, but the caches tend to be invalidated so often that they didn't seem to help much yet. Some of the coarser-grained caches mentioned in some of the other comments could be interesting though, as a way to quickly skip over entire regions when the remaining availability intervals are too short for typical queries.

link

hansvm 698 days ago

> multi-threading

This is one of those places where the details matter a ton. E.g. low contention hazard pointers are nearly free (under 10 clock cycles per read), and if writes are less common than reads (seems likely? you present 10-100 reads for available equipment before somebody chooses 1 or 0 of the available options) you'll barely notice the overhead. Synchronization only gets expensive when you have contention.

Examining the problem of allocating equipment, how often do you actually have contended writes on the same piece of equipment? If it's very often, presumably the top few matches are nearly interchangeable (or, even if they aren't, milliseconds separate who wins the non-interchangeable equipment, and from a user's perspective if you're not running a stock exchange they don't know if they didn't win because the equipment wasn't available or because you have a few milliseconds of fuzz behind the scenes). Can you tweak the business requirements and uniformly choose from the top k matches when displaying results to readers to artificially reduce contention?

Adding onto one of the previous ideas, even if tasks are prioritized, if outside parties don't communicate those priorities there isn't much practical difference over the span of a few milliseconds between low and high priority tasks. Sure, if a high priority task has been waiting longer than a few milliseconds you can prioritize it over low priority tasks, but you have a few milliseconds of slop available to quickly execute a "maybe wrong" solution and increase system throughput.

> caching doesn't work

That happens. It's almost always possible to construct malicious workloads against any caching strategy. If your real workload isn't very cachable, that's annoying, but it's fine.

Did you happen to measure the relative performance of a cache hit vs a cache miss? Your perf budget is fairly tight, and I'd be curious to know if the lack of improvement is because the caches aren't optimized or because the misses reduce the gains.

link