i gave up on the problem about 18 months ago so i didn't keep up with the research area. is this yours? the runtimes are of course very good but i don't see a comparison on how good the approximation is vs telamalloc (or just ILP). i'll say this though: it's miraculuos that the impl is so small.