Hacker News new | ask | show | jobs
by araes 756 days ago
It's a neat idea, and bookmarked it just for ideas on possible designs.

Weird parts though. Does not ever seem to return a website I have ever heard of. And (although it may have nothing to do with you, and everything to do with the modern web) a lot look rather algorithmically designed.

Plus, there's sites like these people: https://www.beltnbags.com.au/ that register on every pure tone and near pure-tone search because they're listed as (#0000ff, 24.9%, #f7f7ff, 13.0%, #f7f7f7, 13.0%, #00ff00, 24.2%, #ff0000, 24.9%)

However, if you go to their website, I don't know what you're scraping, cause it sure does not "look" like its 1/4 red, 1/4 green, 1/4 blue, and 1/4 of dueling eggshell white. It looks white, black, teeny red labels, and lots of brown clothes.

2 comments

There are definitely some odd ones. The screenshots were done a few hours after the initial crawl, some pop-ups may have appeared on the first crawl.

The crawler also blocks image loading (to not take image colors into account because it's screenshotting the entire page and creating a frequency of each color from the screenshot, pixel-wise) so it's possible that some fallback CSS was displayed instead of the images.

I definitely have to refine it more and it's far from perfect. There are only ~25k websites indexed and I used a list of 1M most popular websites created by the developers of a Chrome extension, based on what most people were visiting.

I tried my best to sort by popularity but I'm sure that the list is not perfect. Plus, some websites are really unfriendly to crawlers and, as much as you try to hide it (by faking user agents, using realistic window dimensions, proxies, etc.), some will still detect you.

> Does not ever seem to return a website I have ever heard of.

Just playing devil's advocate, there are a LOT of domains out there. A quick search says there's still well over a billion websites on the internet.

Totally realize there's a lot of websites. Just figured with fairly simplistic color tone choices, you might get something like Google 'occasionally'.

Google as example: Current logo: #4086F4 (blue), #EB4132 (red), #FBBD00 (yellow), #31AA52 (green). Google (at least on my search) only shows up as a leftover of somebody else's search. Of course, there's also only 9 results.