|
|
|
|
|
by ro_dobre
756 days ago
|
|
There are definitely some odd ones. The screenshots were done a few hours after the initial crawl, some pop-ups may have appeared on the first crawl. The crawler also blocks image loading (to not take image colors into account because it's screenshotting the entire page and creating a frequency of each color from the screenshot, pixel-wise) so it's possible that some fallback CSS was displayed instead of the images. I definitely have to refine it more and it's far from perfect. There are only ~25k websites indexed and I used a list of 1M most popular websites created by the developers of a Chrome extension, based on what most people were visiting. I tried my best to sort by popularity but I'm sure that the list is not perfect. Plus, some websites are really unfriendly to crawlers and, as much as you try to hide it (by faking user agents, using realistic window dimensions, proxies, etc.), some will still detect you. |
|