| IMHO: this bumps, hard, against limitations of language / human-machine analogies. But let's try -- TL;DR 262,144, but don't take it literally: - The output of a decoding function is a token. ~3/4 of a word. Let's just say 1 word. - Tokens considered per token output = 262,144 Total number of token considerations for 1 output token = beam_size * branching_factor * max_iterations = 512 * 32 * 16 = 262,144. - Let's take their sample solution and get a word count. https://storage.googleapis.com/deepmind-media/DeepMind.com/B... - Total tokens for solution = 2289 - Total # of tokens considered = 600,047,616 = 262,144 * 2289 - Hack: ""number of solutions considered"" = total tokens considered / total tokens in solution - 262,144 (same # as number of tokens we viewed at each iteration step, which makes sense) |