| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by -_- 280 days ago
	DSPy is great for prompt optimization but not so much for RL fine-tuning (their support is "extremely EXPERIMENTAL"). The nice thing about RL is that the exact prompts don't matter so much. You don't need to spell out every edge case, since the model will get an intuition for how to do its job well via the training process.

1 comments

nextworddev 280 days ago

Isn’t the latest trend in RL mostly about prompt optimization as opposed to full fine tuning

link

ag8 280 days ago

prompt optimization is very cool, and we use it for certain problems! The main goal with this launch is to democratize access to "the real thing"; in many cases, full RL allows you to get the last few percent in reliability for things like complex agentic workflows where prompt optimization doesn't quite get you far enough.

There's also lots of interesting possibilities such as RLing a model on a bunch of environments and then prompt optimizing it on each specific one, which seems way better than, like, training and hot-swapping many LoRAs. In any case, _someone's_ ought to provide a full RL api, and we're here to do that well!

link

nextworddev 280 days ago

Thanks. Is this mainly for verifiable tasks or any general task

link

ag8 280 days ago

It's for any task that has an "eval", which is often verifiable tasks or ones that can be judged by LLMs (e.g. see [0]). There's also been recent work such as BRPO [1] and similar approaches to make more and more "non-verifiable" tasks have verifiable rewards!

[0]: https://runrl.com/blog/funniest-joke

[1]: https://arxiv.org/abs/2506.00103

link

-_- 280 days ago

There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!)

link