Hacker News new | ask | show | jobs
by thomasahle 492 days ago
> Latent / embedding-space reasoning seems a step in the right direction

Might be good for reasoning, but it's terrible for interpretation / AI-safety.

1 comments

Why is it any different to do 4 recurrent passes than having a model that is 4x deeper?
Running one layer 4 times should fetch the weights of that layer once. Running 4 layers makes you fetch 4x parameters.

The recurrent approach is more efficient when memory bandwidth is the bottleneck. They talk about it in the paper.

Yeah, understood. I'm excited for the reduction in parameter count that will come when this is taken up in major models.

I meant it rhetorically in reference to interpretability. I don't see a real difference between training a model that is 100b parameters vs a (fixed) 4x recurrent 25b parameter model as far as understanding what the model is `thinking` for the next token prediction task.

You should be able to use the same interpretability tooling for either. It can only `scheme` so much before it outputs the next token no matter if the model is just a fixed size and quite deep, or recurrent.

I guess the most interpretable is to have as shallow a model as possible, but with longer cot. It would be quite interesting seeing the trade-off between the two. Though, unfortunately, deeper is probably better.