| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nimithryn 2714 days ago

>>>Don't expect the industry to lead this effort, though. The industry sees the reliance on large datasets as something to be exploited for a competitive advantage.

This is only true for the Facebooks and Googles of the world. There are definitely small companies (like the one I work for) trying very hard to figure out how to build models that use less data because we don't have access to those large datasets.

The industry is larger than just the Big N.

2 comments

YeGoblynQueenne 2714 days ago

Btw, if you have relational data and a few good people with strong computer science backgrounds rather than statisticians or mathematicians, have a look at Inductive Logic Programming. ILP is a set of machine learning techniques that learn logic programs from logic programs. The sample efficiency is on a class of its own and it generalises robustly from very little data[1].

I study ILP algorithms for my PhD. My research group has recently developed a new technique, Meta Interpretive Learning. Its canonical implementation is Metagol:

https://github.com/metagol/metagol

Please feel free to email me if you need more details. My address is in my profile.

___________________

[1] As a source of this claim I always quote this DeepMind paper where Metagol is compared to the authors' own system (which is itself an ILP system, but using a deep neural net):

https://arxiv.org/abs/1711.04574

ILP has a number of appealing features. First, the learned program is an explicit symbolic structure that can be inspected, understood, and verified. Second, ILP systems tend to be impressively data-efficient, able to generalise well from a small handful of examples. The reason for this data-efficiency is that ILP imposes a strong language bias on the sorts of programs that can be learned: a short general program will be preferred to a program consisting of a large number of special-case ad-hoc rules that happen to cover the training data. Third, ILP systems support continual and transfer learning. The program learned in one training session, being declarative and free of side-effects, can be copied and pasted into the knowledge base before the next training session, providing an economical way of storing learned knowledge.

link

nimithryn 2714 days ago

Ah yes I am very familiar with ILP - thanks for sending these references!

link

YeGoblynQueenne 2714 days ago

You're welcome, and what a pleasant surprise, it's rare to find people who know about ILP in the industry :)

link

YeGoblynQueenne 2714 days ago

You're absolutely right and I appreciate that very much. On the other hand, there's an incredible amount of hype around Big Data and deep learning, exactly because the large corporations are doing it. So now everyone wants to do it, whether they have the data for it or not, whether it really adds anything to their products or not.

As to the Big N (good one) what I meant to say is that I don't see them trying very hard to undo their own advantage, by spending much effort developing machine learning techniques that rely on, well, little data. That would truly democratise machine learning- much more so than the release of their tools for free, etc. But then, if everyone could do machine learning as well as Google and Facebook et al, where would that leave them?

link