Jokes aside this is pytorch so this is compiled to C++ or cuda, the problem likely comes from the different functions that are called for += vs +