| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dynode 3565 days ago
	Replying to myself a quote from my article "It’s easy to notice a bug when examining the colors for Google (note, this is normal google.com not a doodle). Notice how the three colors are light gray, dark gray, and white – not the typical red, green, blue, yellow color scheme. Why? Well, when the image screenshot is resized to 320 x 240 pixels for processing, the colors are dithered. The number of pixels in the new image that lie between red, green, blue, yellow and white – the dominant background color – is much larger than the number of pixels that are colored. Because of dithering, those between pixels are closer to shades of gray, than colors, and thus the k-means clustering (with EM) finds shades of gray and white to be the “color of Google”. I’m not sure if this is a bug.. what do you think?"

2 comments

paulhebert 3565 days ago

Hey Andy,

That's awesome! I figured someone else must have had the same idea before me. :)

I think your screenshot scraping technique is probably more accurate than my text parsing. I also like that you used a larger sample size. I plan to experiment with groups of 100 and 1000.

Thanks for sharing! It's always interesting to see how different people achieve similar goals.

I'd like to begin scraping the images on the sites soon too. When I've got a good chunk of time I'll look through your source code for inspiration. Mind if I reach out with questions when I do?

EDIT: I also really enjoy those woodblock prints! Now I want to somehow print my data for the top ten sites onto canvas.

dynode 3565 days ago

Sure - I think the git repo is dead, I'll resurrect it if you're interested.

paulhebert 3565 days ago

Yeah, that would be great. Thanks!

im4w1l 3565 days ago

Rather than resizing to 320x240, pick that number of pixels randomly. For even better results use some method of variance reduction e.g. divide the screen into n squarish rectangles and pick N/n pixels from every rectangle.