It is my understanding that the M1 chip has unified CPU/GPU memory which means that Metal as the underlying framework might be clever enough to not copy the data at all.
Not sure it applies to his use-case though.
I was mostly talking about RTX 2080Ti which he is comparing against.
It's like you're moving just across the street, and loading every single box into a car, crossing the street, then unloading the box instead of just walking on foot. You need to drive further (bigger networks) and load more boxes at once (batch size) for a car to actually be useful in this scenario.
It's like you're moving just across the street, and loading every single box into a car, crossing the street, then unloading the box instead of just walking on foot. You need to drive further (bigger networks) and load more boxes at once (batch size) for a car to actually be useful in this scenario.