Hacker News new | ask | show | jobs
by thesimon 4003 days ago
Worked on something similar on a local open data hackathon before, but instead I used a scraper to parse the logos.

https://github.com/c0dr/LogoParser

It worked okay for like 40% of the sites, and for the rest of the sites we used Python and scikit-learn to detect the logo from the page (threw all images of the page in the script and it returned if it was a logo or not). And this actually worked quite good, irrc over 90% of the test cases worked.

https://github.com/tomsrocket/image-classification

But yeah, using Twitter as a source might also be a good idea.