Hacker News new | ask | show | jobs
by refulgentis 889 days ago
IMHO: this bumps, hard, against limitations of language / human-machine analogies.

But let's try -- TL;DR 262,144, but don't take it literally:

- The output of a decoding function is a token. ~3/4 of a word. Let's just say 1 word.

- Tokens considered per token output = 262,144 Total number of token considerations for 1 output token = beam_size * branching_factor * max_iterations = 512 * 32 * 16 = 262,144.

- Let's take their sample solution and get a word count. https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

- Total tokens for solution = 2289

- Total # of tokens considered = 600,047,616 = 262,144 * 2289

- Hack: ""number of solutions considered"" = total tokens considered / total tokens in solution

- 262,144 (same # as number of tokens we viewed at each iteration step, which makes sense)