| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thomasahle 492 days ago
	> Latent / embedding-space reasoning seems a step in the right direction Might be good for reasoning, but it's terrible for interpretation / AI-safety.

1 comments

Tostino 491 days ago

Why is it any different to do 4 recurrent passes than having a model that is 4x deeper?

link

lonk11 491 days ago

Running one layer 4 times should fetch the weights of that layer once. Running 4 layers makes you fetch 4x parameters.

The recurrent approach is more efficient when memory bandwidth is the bottleneck. They talk about it in the paper.

link

Tostino 491 days ago

Yeah, understood. I'm excited for the reduction in parameter count that will come when this is taken up in major models.

I meant it rhetorically in reference to interpretability. I don't see a real difference between training a model that is 100b parameters vs a (fixed) 4x recurrent 25b parameter model as far as understanding what the model is `thinking` for the next token prediction task.

You should be able to use the same interpretability tooling for either. It can only `scheme` so much before it outputs the next token no matter if the model is just a fixed size and quite deep, or recurrent.

link

thomasahle 491 days ago

I guess the most interpretable is to have as shallow a model as possible, but with longer cot. It would be quite interesting seeing the trade-off between the two. Though, unfortunately, deeper is probably better.

link