| I think you're right to bring up the NFLT, but I don't think it is applicable, it just points at the real question. The key assumption to get the NFLT is that each environment vote has the same weight, i.e. we are targeting a uniform distribution on objective functions / environments / problems / whatever you call it. If you break this assumption, you get an opposite result which is that search algorithms divide into some equivalence classes determined by the sets of different outcomes (traces, if I remember the theorem's description) that you discriminate between. A uniform distribution like this is actually a very very strong precondition; it implies (looking at results about the complexity of sets of strings, since choosing an environment is like choosing a string from 2^N given some encoding, etc) that you care equally about a very large number of environments most of which have no compressible structure or equivalently have a huge kolmogorov complexity. Most of these environments would not have a compact encoding, relative to a particular choice of machine, but we are weighing these the same as those environments which are actually implementable using less than a ridiculous amount of storage to represent the function. The reason why I think this is too strong an assumption to use is then that we don't care about all these quadrillion problems which have no compact encoding - we know this because we literally can't encounter them as they would be too large to ever write down using ordinary matter. Allowing for this, talking usefully about evaluating an AGI or equivalently a search strategy or optimization algorithm implies having an understanding of the distribution of environments / problems we care about. I think capturing this concept in a 'neat' way would be a significant contribution; I had a go during my PhD but failed to get anywhere. Unfortunately things like K-complexity are uncomputable, so reasoning about distributions in those terms is a dead-end. |
Other authors (Legg and Hutter, 2007) followed the line of reasoning in your comment much more literally. They proposed to measure the intelligence of an agent as the infinite sum of the expected rewards the agent achieves on each computable environment, weighted by 2^-K where K is the environment's Kolmogorov complexity. Which seems as if it gives "one true measure" of intelligence, but actually that isn't the case at all, because Kolmogorov complexity depends on a reference universal Turing machine (Hutter himself eventually acknowledged how big a problem this is for his definition, Leike and Hutter, 2015).
My position is that any attempt to come up with "one true comparison of intelligence" (as opposed to a parametrized family) should be viewed with skepticism, because relative intelligence really must depend on a lot of arbitrary choices.