|
|
|
|
|
by bob1029
412 days ago
|
|
> Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. In context of genetic programming and other non-traditional ML techniques, I've been having difficulty attempting to locate a simple fitness function that reliably proxies natural language string similarity due to this effect. For example, say you use something like common prefix length to measure how close a candidate's output string is to an objective string given an input string. The underlying learner will inevitably start doing things like repeating the input verbatim, especially if the input/output training tuples often share a lot of prefixes. So, you might try doing something like reversing the input to force learning to take a less crappy path [0]. The learner may respond degenerately by inventing a string reversing technique and repeating its prior behavior. So, you iterate again and try something like base64 encoding the input. This might take, but eventually you wind up with so many weird hacks that the learner can't make progress and the meaning of the quantities evaporates. Every metric I've ever looked at gets cheated in some way. The holy grail is probably normalized information distance (approximated by normalized compression distance), but then you have a whole new problem of finding an ideal universal compressor which definitely doesn't exist. [0]: https://arxiv.org/abs/1409.3215 (Figure 1) |
|
if only we could explain this in "politician" language... too many with too much power think the second coming will deliver the "ideal universal" which doesn't exist