Hacker News new | ask | show | jobs
by prknight 5372 days ago
Increasing the dataset of color names doesn't really work as well to address this problem in my opinion. I looked at a pretty large set of colors of 5000+ and I discarded it because it just introduced too many exotic sounding names only a color expert can tell apart - it wasn't an UI improvement and it still doesn't give consistent results for the darkest and lightest colors.

I think working with HSL values works better in principle. Just adjusting the lightness value can be used to arrive at a more sensible color description. The most common lightness values for color names hover around 50% (at least in the 3000+ dataset I compiled). One solution is to have a simple formula that takes in account how accurate the nearest color name match is and if it is too inaccurate, to find the color name for the color's HSL with the L adjusted to 50%. I guess you could call it staying true to the hue. I think that would make the color naming script even more useful though.

1 comments

One solution could be to use several points for each label (color name). For instance, we could use a limited set of color names (red, green, blue, etc) but have a lot of points associated to this labels. For example, different points like #07250b (that is, in my opinion, missclassified) and #51f665 could share a same label: green. The main problem is that I'm not sure It is possible to find such a dataset on the internet. Maybe we can build one from sites like http://cloford.com/resources/colours/500col.htm, removing the numbers from each color name.

With several points for each label, It will give us the possibility of using a 3NN, for example, instead of a 1NN classifier. It should impact also on the results, but I'm not sure it will really improve the results.