Because driving isn't some narrow task. It has a lot of variability in conditions. Human brain is very adaptable, yet we require 16+ years of training before they can take the wheel.
I'm not disagreeing that the task is difficult but this topic isn't about general driving - this is about replacing ultrasonic sensors with visual systems. Evolutionary arguments are meaningless here since we outperform evolution all the time even with hand engineered visual systems, let alone ML derived ones.