| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mappingbabeljc 3506 days ago
	Even more interesting - an OpenAI paper on the roughly same subject/technique came out a little earlier (RL2 fast reinforcement learning for slow reinforcement learning https://arxiv.org/pdf/1611.02779.pdf). I think parallel inventions tends to indicate that multiple people have stumbled on a similar good idea at the same time. (disclaimer: I work at OpenAI. v pleased to see these two papers emerge so close together)

2 comments

gwern 3506 days ago

Or there's just so much work being done that people are stepping on each other's toes. There were like 5-10 papers at ICLR this year on just the same 'let's treat NN hyperparameters as a MDP' idea. Or look at the concrete distribution - two different Google groups published papers on almost the same exact idea simultaneously apparently in total ignorance of each other. Or the lipreading NNs the past week - two Oxford groups working on different datasets with different architectures publishing >human results. The pace is so fast the left hand doesn't know what the right hand is doing. It's fun for the spectator (you have no idea what will come out tomorrow, much less next month or next year) but it strikes me as rather inefficient.

link

zhongwenxu 3505 days ago

On Google and Oxford's cases, I don't think it is common to broadcast among your colleagues in different groups your proposed ideas before you actually go into it. So I think it may happen even in the same company/university.

For the meta-learning papers, you may have interests to read the related work part of the RL^2 paper https://arxiv.org/pdf/1611.02779.pdf.

Quoted as follows,

"Our work draws inspiration from a particular line of work (Younger et al., 2001; Santoro et al., 2016; Vinyals et al., 2016), which formulates meta-learning as an optimization problem, and can thus be optimized end-to-end via gradient descent."

"Another line of work (Hochreiter et al., 2001; Younger et al., 2001; Andrychowicz et al., 2016; Li & Malik, 2016) studies meta-learning over the optimization process. There, the meta-learner makes explicit updates to a parametrized model."

Inspired by the same works, apply the meta learning idea into RL problems, meet the ICLR deadline together. Still make sense right?

link

taeric 3506 days ago

Why do you call it inefficient? If one side was suppressing their papers because they were just slightly beat to the press, I would be worried. As it is, I'd be more worried about breaking a system that is working for the sake of efficiency than I am for any losses from the system.

I think more directly stated, do you think we could get better results somehow? What would those look like?

link

gwern 3506 days ago

If you've been beat to the press, you might as well release your paper to salvage something from your sunk costs of time/effort/GPUs. It's inefficient because in solving almost the same exact problems, they are duplicating each other's work instead of sharing the intermediate steps. If there were more sharing of low-grade information, along the lines of 'I'm working on a lipreading CNN, it's going pretty well' 'oh hey we're working a lipreading CNN too!', then the datasets and GPU clusters and math could be pooled and better single results released quicker. As it is, now you have to read two different papers about lipreading CNNs and puzzle over the differences and two different papers about the concrete distribution trick, and they probably all came out a month or two later because everyone had to redo work for their separate system & paper.

link

taeric 3506 days ago

Duplicated work is not necessary a sign of inefficiency, though.

My hope would be that they each learned something slightly different in solving the same problem. Eventually, things may converge to a single answer. However, there is no evidence to see that we should demand the convergence at the beginning.

So, the shame here is if folks are not comparing and contrasting the different solutions to the same problem. I confess I am guilty in that I have not read both papers. But I will try to see if it can help me understand.

link

oneman 3506 days ago

machine superintelligence is hyperintegration of the metasystem

(disclaimer: incomplete knowledge is risk)

link