| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jkldotio 4884 days ago

Having worked with a lot of ML guys who were ahead of Google on numerous fronts I have to agree. With Knol's death Google failed to control Wikipedia, arguably one of more important ML datasets. People can fire up the common crawl on demand from Amazon. Anyone who thinks Google is the real bleeding edge just isn't browsing recent academic papers.

I've got no formal CS training and if I get funding for jkl.io the objective is to have (most of) a Google News (English) competitor implemented in a year, part-time. Google has thousands of ML employees but there are three million users on Github. If I need facial recognition, it's on Github. Topic modelling to layer on top of my NLP, or to aid in entity resolution, on Github. Crawlers, got it. Next gen databases (http://hyperdex.org/), got it. The jkl.io site is only just over 1000 lines of code written by me at the moment, but it probably uses tens of thousands from just the python libraries before we even talk about the DB and the OS.

The more people understand the filter bubble and the information diet concepts the more personalisation will be a thing only for side interests and friendship networks. I don't think people want black box advertising-oriented algorithms manipulating their political and economic news. The computation required for me is therefore so much smaller and cheaper. I know it's not HN's focus because people want their exit money but donation models, as Wikipedia beating Knol shows, can actually be the most efficient solution in many domains where you can't trust a corporation with a fiduciary duty to maximize shareholder profit.

People might say "but what about really huge data like location services using not just GPS, but mobile data and wifi response times, pictures from Google's new alt-reality game and street view"; they might say "Google just can't be caught up to" and point to the failure of Apple's maps. But I worked with some guys who scaled a solution using SIFT features => Lucene that could geo-locate instantly on massive datasets of images. You can prove an algorithm can scale theoretically without having 10,000 machines to run it on. One of the key points separating computer science from just programming is the analysis of algorithms in theoretical terms. Apple's failure was because they are primarily a luxury product company not an ML company but people just think "technology". Even so Apple can get stuff done, or buy companies that can (Siri). Microsoft, Yandex, Yahoo, Amazon, huge rising data powers in Asia, thousands of computer science professors, tens of thousands of post docs and doctoral students, millions of Github tinkerers are not going to fall behind. Google isn't even the major search engine in a lot of countries.