Good question- with Safari, we use the visible domain to help score the confidence for mobile websites. For native apps, we don't have any such "easy" confidence boost. So we have to fingerprint the app based on different features present in the image that is captured.
The system is computer vision / machine learning based, so even on novel sites, it will get better over time with more usage and training. We've trained it up for a bunch of the most popular sites already though.
I would strongly guess it's a fixed set of product images they're training against, possibly attained by massive scraping. Another part of training or processing might consist of a reverse image search API, like TinEye, and gathering metadata from the pages containing the result images.
No, we do some client-side prefiltering to ignore non-products, and we also do some more extensive server-side filtering. We also allow you to cancel snap processing, as well as go into an "Ignore new snaps" mode via the settings in the app. Also, the upload only happens when the app is foregrounded, giving you further control.