| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Imnimo 889 days ago

I'm very curious how often the LM produces a helpful construction. Surely it must be doing better than random chance, but is it throwing out thousands of constructions before it finds a good one, or is it able to generate useful proposals at a rate similar to human experts?

They say in the paper, "Because the language model decoding process returns k different sequences describing k alternative auxiliary constructions, we perform a beam search over these k options, using the score of each beam as its value function. This set-up is highly parallelizable across beams, allowing substantial speed-up when there are parallel computational resources. In our experiments, we use a beam size of k = 512, the maximum number of iterations is 16 and the branching factor for each node, that is, the decoding batch size, is 32."

But I don't totally understand how 512 and 16 translate into total number of constructions proposed. They also note that ablating beam size and max iterations seems to only somewhat degrade performance. Does this imply that the model is actually pretty good at putting helpful constructions near the top, and only for the hardest problems does it need to produce thousands?

2 comments

kingkongjaffa 889 days ago

thanks TIL what Beam search is https://en.wikipedia.org/wiki/Beam_search

refulgentis 889 days ago

IMHO: this bumps, hard, against limitations of language / human-machine analogies.

But let's try -- TL;DR 262,144, but don't take it literally:

- The output of a decoding function is a token. ~3/4 of a word. Let's just say 1 word.

- Tokens considered per token output = 262,144 Total number of token considerations for 1 output token = beam_size * branching_factor * max_iterations = 512 * 32 * 16 = 262,144.

- Let's take their sample solution and get a word count. https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

- Total tokens for solution = 2289

- Total # of tokens considered = 600,047,616 = 262,144 * 2289

- Hack: ""number of solutions considered"" = total tokens considered / total tokens in solution

- 262,144 (same # as number of tokens we viewed at each iteration step, which makes sense)