However, I think PyTorch does it the same way (?), at least they say something like this in their docs.
"This function accumulates gradients in the leaves - you might need to zero .grad attributes or set them to None before calling it." - https://docs.pytorch.org/docs/stable/generated/torch.autogra...
The rust burn crate does it better, they store the backprop'd gradients in a separate container and return it: https://github.com/tracel-ai/burn/blob/af381ee18566fc27f5c98...