Hacker News new | ask | show | jobs
by dynode 3565 days ago
Replying to myself a quote from my article

"It’s easy to notice a bug when examining the colors for Google (note, this is normal google.com not a doodle). Notice how the three colors are light gray, dark gray, and white – not the typical red, green, blue, yellow color scheme. Why? Well, when the image screenshot is resized to 320 x 240 pixels for processing, the colors are dithered. The number of pixels in the new image that lie between red, green, blue, yellow and white – the dominant background color – is much larger than the number of pixels that are colored. Because of dithering, those between pixels are closer to shades of gray, than colors, and thus the k-means clustering (with EM) finds shades of gray and white to be the “color of Google”. I’m not sure if this is a bug.. what do you think?"

2 comments

Hey Andy,

That's awesome! I figured someone else must have had the same idea before me. :)

I think your screenshot scraping technique is probably more accurate than my text parsing. I also like that you used a larger sample size. I plan to experiment with groups of 100 and 1000.

Thanks for sharing! It's always interesting to see how different people achieve similar goals.

I'd like to begin scraping the images on the sites soon too. When I've got a good chunk of time I'll look through your source code for inspiration. Mind if I reach out with questions when I do?

EDIT: I also really enjoy those woodblock prints! Now I want to somehow print my data for the top ten sites onto canvas.

Sure - I think the git repo is dead, I'll resurrect it if you're interested.
Yeah, that would be great. Thanks!
Rather than resizing to 320x240, pick that number of pixels randomly. For even better results use some method of variance reduction e.g. divide the screen into n squarish rectangles and pick N/n pixels from every rectangle.