Hacker News new | ask | show | jobs
by threeseed 1321 days ago
> The fact that (most) humans manage to drive around safely and successfully in current roads proves that the information needed exists in the pixel-space

But that doesn't mean that it translates to a car.

We constantly move our 576MP resolution eyes in multiple orientations in order to visualise a scene and focus on the most important areas. Cars have fixed, low-quality cameras.

We then interpret this data using the most advanced pattern recognition system the world has ever seen that is trained for at least 20+ years to fully comprehend the behaviour of everything this planet has to offer. Cars don't have anything close to this.

2 comments

You kind of want to make it seem like a 576 MP resolution (where did you even get this number from while people still argue about a fair comparison between human eye and a camera?) or having to move your head/eyes to visualize your surroundings rather than actually having multiple fixed cameras covering the entire surroundings all the time is a good thing? If the resolution mattered that much, every car would have ultra-high resolution cameras on it.

Humans certainty have a stronger and general prior to make sense out of the information, and that's exactly why I left it as a possibility. Cars don't * yet * have anything close to it, just like they didn't have a way to accurate detect objects a few years ago and just like they didn't have a way to capture RGB information a few decades ago.

I am an optimistic guy, and I certainly believe in the power of learning at scale.

> 576MP

Actually our eyes are more like 8MP: https://www.picturecorrect.com/what-is-the-resolution-of-the...

Perhaps higher synthetic resolution from moving our eyes about, or perhaps that is meaningless.

It could be reframed as saying we have a peak acuity equivalent to a 576MP camera of the same FOV with a theoretical max of 20 samples per second (50 ms to move targets, realistically probably more like single digits). The 8MP comparison is only relevant if there are so many targets that need constant full resolution that you can't focus on all of them or the targets are so large that they are larger than the peak acuity FOV. In practice this is not the case because we can identify something once and keep tracking it in the periphery without issues and something that large will likely be extremely easy to identify.
That doesn't make sense: a camera doesn't get more pixels just because the camera is taking a video tracking something. Neither if it had zoom and a controlled gimbal.
If you turn that tracked video into a panorama it would. Or if you took 10 zoomed photos and stitched them over top of an unzoomed photo. The point is that unless the task demands more focus areas than the eye can focus on in a given window then the visual acuity (for the parts of the scene that matter) is higher than an 8MP shot of the entire scene.