probably the only way to make something like this affordable. But the way models are trained right now is completely wrong for this anyway. They’re nowhere near good enough at estimating their own uncertainty and it’s RLHF’s fault for these ‘crutch plans’ we get. I guess the point of the architecture is that it shouldn’t need big smart models to work well, but whatever models you use, what’s the post-train for forked execution going to look like? This sounds so expensive to train too