|
It's a neat idea, and bookmarked it just for ideas on possible designs. Weird parts though. Does not ever seem to return a website I have ever heard of. And (although it may have nothing to do with you, and everything to do with the modern web) a lot look rather algorithmically designed. Plus, there's sites like these people: https://www.beltnbags.com.au/ that register on every pure tone and near pure-tone search because they're listed as (#0000ff, 24.9%, #f7f7ff, 13.0%, #f7f7f7, 13.0%, #00ff00, 24.2%, #ff0000, 24.9%) However, if you go to their website, I don't know what you're scraping, cause it sure does not "look" like its 1/4 red, 1/4 green, 1/4 blue, and 1/4 of dueling eggshell white. It looks white, black, teeny red labels, and lots of brown clothes. |
The crawler also blocks image loading (to not take image colors into account because it's screenshotting the entire page and creating a frequency of each color from the screenshot, pixel-wise) so it's possible that some fallback CSS was displayed instead of the images.
I definitely have to refine it more and it's far from perfect. There are only ~25k websites indexed and I used a list of 1M most popular websites created by the developers of a Chrome extension, based on what most people were visiting.
I tried my best to sort by popularity but I'm sure that the list is not perfect. Plus, some websites are really unfriendly to crawlers and, as much as you try to hide it (by faking user agents, using realistic window dimensions, proxies, etc.), some will still detect you.