|
|
|
|
|
by Quarrelsome
289 days ago
|
|
> if you can find a way to produce verifiable rewards about a target world I feel like there's an interesting symmetry here between the pre and post LLM world, where I've always found that organisations over-optimise for things they can measure (e.g. balance sheets) and under-optimise for things they can't (e.g. developer productivity), which explains why its so hard to keep a software product up to date in an average org, as the natural pressure is to run it into the ground until a competitor suddenly displaces it. So in a post LLM world, we have this gaping hole around things we either lack the data for, or as you say: lack the ability to produce verifiable rewards for. I wonder if similar patterns might play out as a consequence and what unmodelled, unrecorded, real-world things will be entirely ignored (perhaps to great detriment) because we simply lack a decent measure/verifiable-reward for it. |
|