This seems really weird to me. Isn't that just using LLMs in a specific way? Why come up with a new name "RLM" instead of saying "LLM"? Nothing changes about the model.
New architecture to building agent, but not the model itself. You still have LLMs, but you kinda give this new agentic loop with a REPL environment where the LLM can try to solve the problem more programmatically.
It ended up kicking off reasoning training which enabled the massive gains in coding, tool use, and more over the last 18 months.
So yeah, it's "just using LLMs in a specific way."