This takes me to the famous FizzBuzz High performance codegolf answer [1]. If we could implement optimizations like that for the inferences, maybe we could increase the speeds 10x or more.
I love scrolling and reading through this, thinking yeah of course Python is slower than Java, oh wow Rust is pretty on par I wonder what the Java devs did. Then you hit asm and your jaw drops.