Language modeling:
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach https://arxiv.org/pdf/2502.05171
Puzzle solving:
A Simple Loss Function for Convergent Algorithm Synthesis using RNNs https://openreview.net/pdf?id=WaAJ883AqiY
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking https://arxiv.org/abs/2202.05826
Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks https://proceedings.neurips.cc/paper/2021/file/3501672ebc68a...
General:
Think Again Networks and the Delta Loss https://arxiv.org/pdf/1904.11816
Universal Transformers https://arxiv.org/abs/1807.03819
Adaptive Computation Time for Recurrent Neural Networks https://arxiv.org/pdf/1603.08983