Hacker News new | ask | show | jobs
by cleak 812 days ago
Author here. I'm a bit confused by this response. Sure, going full ECS and processing all objects in a single (or few) function calls is likely to be faster. But there's an obvious tradeoff in solution complexity.

Both Unreal and Unity make heavy use of per-instance per-frame virtual functions. It's a paradigm that has clear value. Why not make it cheaper at no cost to the dev? The option to take a systems approach to bottlenecks is the same afterwards.

3 comments

> Both Unreal and Unity make heavy use of per-instance per-frame virtual functions. It's a paradigm that has clear value.

It's a paradigm that had (debatable) value in the late 1990's and early 2000's when Unreal Engine and Unity had been designed and OOP was all the rage ;)

In the meantime we realized that not everything should in fact be an (OOP) object and runtime polymorphism is usually not needed to build games, even if the whole "animal => mammal => dog" classification thing at first sounds like it would be a good match to model game objects.

With runtime polymorphism it's like with memory management. If memory management shows up in profiling at all, it's better to drastically reduce the frequency of allocations instead of looking for a faster general purpose allocator (and if it doesn't show up in profiling as it should be, integrating a faster allocator doesn't make much sense either because it won't make a difference).

Of course stuff like this is not easy to do late into a project because it would involve a complete redesign of the architecture and rewriting all code from scratch - and that's why dropping in a faster allocator sometimes makes sense as a compromise, but it's not a fix for the underlying problem, just a crutch).

Also, the more important problem with indirect function calls than the call overhead is usually that they present a hard optimization barrier for the compiler.

> It's a paradigm that had (debatable) value in the late 1990's and early 2000's when Unreal Engine and Unity had been designed and OOP was all the rage ;)

For inheritance, I 100% agree. Composition all the way. I think it has value as an interface though - at least for quick bring up and fast iteration. It can of course bring scaling challenges - I recently worked on a project that had hundreds of devs and more than 50k game components. That brought all of the architectural and performance challenges you'd expect from this approach.

> Also, the more important problem with indirect function calls than the call overhead is usually that they present a hard optimization barrier for the compiler.

In the years I've had to think about this, I'd take a slightly different approach that should be more amenable to compiler optimization. I'd maintain separate lists for each concrete type and have a type aware process function (via templates) which requires all overrides to be marked final. That should allow the compiler to do inlining, avoid indirections, etc. The major downside here is handing over a footgun to the dev - forget that final keyword or pass the object in not as concrete type and performance will suffer. I'd probably still walk the vtable to see if a function has been overridden - it's unfortunate that there doesn't seem to be a way to do this without resorting to such tricks.

Hi Author. I did not say that, but I did think it. I understand if it is confusing you have no idea if the person giving the admonishment "don't do this" is just sniffing glue, so I will try to explain why I think most people probably should do this.

Firstly, I think the graph in your article shows no advantage except with a cold cache. I agree with this, and think we probably disagree on how likely a "cold cache" is, or how to model that when thinking about a program. I argue you should never bother.

In a game (or other realtime scenario that runs continuously) you understand that every "n" frames you are going to spill cache "m" times, and that because the game runs continuously "m/n" averages out on some value.

I have worked code that aims for m=0, and it's a little bit like superconductivity in some respects: strange things happen with your code when it looks like that, but remember that L1 is only 64kb so if your program gets any bigger than that or ever does a system call and jumps into the kernel (which is definitely bigger than that) then m= at least 1.

Microbenchmarks tend to be m=0 which makes working with them tricky. Real programs are usually at least m=1 and usually much much bigger: m=1 is still something like 100GB (gigabytes) per second.

Now through all that, it is tempting to try and think of m as the sum of latencies, but it's important to remember that the size of the code matters too: Those instructions have to be fetched from memory just like data does, so if your program gets too big (like can happen if you naively unroll every lookup table with its own function) m is going to get bigger faster than anything else. Look at objdump. Look at how big your program gets doing this. How many multiples of 64KB have you eaten with this trick? Each one of those is another m.

That is why this trick is almost certainly never worth it, and if you ever find a big codebase that benefits from a few strategic hoists of this technique I bet putting a prefetch intrinsic in the right place would help better.

I recently watched a talk which covers some of the runtime dynamics you’re talking about and it’s very interesting! The mental model never adequately represents real machine architecture.

https://youtu.be/i5MAXAxp_Tw

You don't need to go full ECS. You can write a static member function that takes a set of objects of that class, and write a non-virtual inlineable member function, and still use your SceneManager approach to keep track of them if you like.

It doesn't feel like there's "no cost to the dev" here because developers will still sooner or later need to understand the relevant code.