Good question- with Safari, we use the visible domain to help score the confidence for mobile websites. For native apps, we don't have any such "easy" confidence boost. So we have to fingerprint the app based on different features present in the image that is captured.
The system is computer vision / machine learning based, so even on novel sites, it will get better over time with more usage and training. We've trained it up for a bunch of the most popular sites already though.
I would strongly guess it's a fixed set of product images they're training against, possibly attained by massive scraping. Another part of training or processing might consist of a reverse image search API, like TinEye, and gathering metadata from the pages containing the result images.