Hacker News new | ask | show | jobs
by TheSpiceIsLife 2462 days ago
This looks to be either 36 or 40, can't quite count the discrete units correctly from the photo.

Let's say it's 40 x ~14MP cameras fixed in to a housing.

What's stopping anyone from using higher resolution subunits and more of them.

Or, if you bolt ten of these things together do you get a 5000MP camera?

4 comments

Well, they are having to align some number of sensors ensuring sufficient overlap to ensure that they can process the images together. This means they either are warping, aligning, and combining the image into one frame to do processing (computationally intensive), or they have significantly more overlap to ensure that one of the frames contains the entire face (storage intensive), or they are doing some super-resolution processing trickery with lower resolution sensors.

500MP x 10 ips is 5 billion pixels per second that they have to process, that is equivalent to processing 20 4K30 streams at once, even without taking into consideration the extra data you would need, then you actually need to store the data somewhere and do the facial recognition. How many of these do you think would be reasonable to have in an area?

> Well, they are having to align some number of sensors

> ensuring sufficient overlap to ensure that they can

> process the images together.

It would be much easier to avoid the issue of stitching altogether and simply process the images separately, then merge the resulting output data. As far as I'm aware, you're not going to find a GPU that can process a 500MP image efficiently.

But when the images are processed separately, your required actual recorded resolution could go up quickly as your overlap between imagers needs to ensure that a face to be identified is contained within a single imager, or at least a large enough portion of the face is contained within an imager to give a sufficiently high confidence of being correct. So as I mentioned, this becomes even more storage intensive, though because there is less image processing, it becomes more CPU efficient.
I disagree, if you miss an individual on the first capture, there's always the second capture. The number of people you fail to detect because their face is halfway between two screens would likely be far exceeded by the number of faces you fail to detect because there's an issue within the algorithm itself or simply the face isn't fully visible.

That's okay. There are of course limitations. You can't detect faces you can't see, for example (i.e. somebody walking the wrong direction). If you're detecting 10k faces at an accuracy of 99.99%, one in every detection frame is a failure.

I guess the headline here isn't that these plucky scientists managed to stick a bunch of cameras together, but rather the Chinese state can stick one of these in a crowded area and instantly identify everybody within it.
these plucky scientists

I assume you used that word sarcastically?

   From Wiktionary:

   plucky (comparative pluckier, superlative pluckiest)

     Having or showing pluck, courage or spirit
     in trying circumstances.

   Synonyms

    brave
    spunky
    feisty
I dunno. IMO a "plucky" Chinese scientist would be one who would refuse to work on something as dystopian as this.
I think that was sarcasm...
Yeah, I guess achieving arbitrarily huge images is just a matter of increasing the resolution/number of sensors and then applying the correct optics. The bottleneck (and technical achievement) of a system like this would be the processing power required for stitching and analyzing these massive images in anything close to real time.
The processing can be parallelized too. Just split the data into chunks and hand it out to GPU's.

I don't forsee any scaling limits really, apart from $$$'s and power availability

Exactly, this is a trivially parallelized problem. My iPhone 11 Pro can easily do real time face boundary display with dozens of faces at once on a 12MP sensor (and that’s not it’s sole purpose, if all it cared about was face boundary identification I’m sure it could do a lot more). With custom ASICs, the face boundary checking could be embedded alongside the sensor array in this mega camera sensor setup. Then once a face boundary is identified, you send just those pixels off to a GPU array to do the face identification.
other things will probably limit visibility, like earth curvature, buildings, etc... probably better to have multiple of those arrays? I would be really curious to get details on the lens setups for this array though.