i gave up on the problem about 18 months ago so i didn't keep up with the research area. is this yours? the runtimes are of course very good but i don't see a comparison on how good the approximation is vs telamalloc (or just ILP). i'll say this though: it's miraculuos that the impl is so small.
> Right now I'm looking into integrating it with IREE
clever guy. IREE is just about the only serious/available runtime where you can do this because IREE codegens the runtime calls as well as the kernel code. But you're gonna have to either patch an existing HAL or write a new one to accomplish what you want to accomplish. If you want I can help you - if you go to the discord (https://discord.gg/J68usspH) and ask about this in #offtopic I'll DM you and can point you to the right places.
http://adambuchsbaum.com/papers/dsa-stoc03.pdf
https://link.springer.com/chapter/10.1007/978-3-540-27798-9_...
https://users.cs.northwestern.edu/~pdinda/ics-s05/doc/dsa.pd...
there are also some from ML people trying to allocate memory optimally for DNNs:
https://arxiv.org/abs/1907.01989
https://arxiv.org/abs/1804.10001
https://arxiv.org/abs/2001.03288
they all boil down to a couple of greedy heuristics. the most recent "cool" paper was from a group at google
https://dl.acm.org/doi/10.1145/3567955.3567961
basic idea is to use both ILP and heurstics. i asked them to open source but no dice :(