They have two eyes close together that do tracking of depth. Plus they have ears and touch, multiple types of sensor synthesis that the Tesla doesn’t even come close to having with a plain old video camera.
Tesla does an extremely dumb and basic depth estimation from two cameras. It still lacks many senses that are critical to movement, and has software that about matches the abilities of a small bird