| I am a principal engineer for a major autonomous vehicle company. You can break this statement down into two components: Adding more sensors slows his team now more than it improves system performance I'll take his word on this. It is a lot of work to incorporate multiple sensors. All necessary information is already in the pixel-space. I hate to disagree with someone as distinguished as Karpathy, but this is simply not what I have observed from all of that data that we have access to. Given my knowledge of the various stacks deployed today, I would never ever ever get into a vehicle using a vision only stack and expect it to perform in some of the challenging environments encountered during testing. |
The fact that (most) humans manage to drive around safely and successfully in current roads proves that the information needed exists in the pixel-space (not just current image, but say current + history). We don't yet have stacks that can successfully map everything needed from this information but I don't think Dr. Karpathy ever claimed that.
(I am not a principal engineer but a mere PhD student who argues daily with people on how RGB information is underappreciated and under utilized)