Hacker News new | ask | show | jobs
by hypoxia 543 days ago
I think the defining story of 2025 will be AI agents getting very good with computer use, largely enabled by RL fine tuning.
2 comments

Lets hope so; computer use with AI is currently absolutely terrible. It is something I expected to see far larger progress in this year but it's no better than last year.
Yeah, +1. Looking back to the WebVoyager [1] and GPT4V generalist agent [2] papers from last January, it feels like we haven't come that far.

But there are now several major technical unlocks - fine tuning for cursor locations (in Claude), better reasoning with o3, and RL fine-tuning so we can learn based on task success.

That gives me significant hope.

[1] https://arxiv.org/abs/2401.13919

[2] https://arxiv.org/abs/2401.01614

Could you help understand the importance of RL finetuning? What can it accomplish that regular finetuning can't? What's a use case for it?
From my experience there are three key issues with agents today:

1. They usually don't end up completing the right set of steps required to complete tasks when using our human-defined frameworks (react, rewoo, supervisor-worker, teams of multi-agents, etc.)

2. They get lost easily, and forget what they were doing or complete the same tasks over and over in a loop (bad planning)

3. They exit early, thinking they have completed the task when they have not (bad evaluation)

The jump in reasoning ability from 4o to o3 will enable a drastic improvement in planning and execution within our human defined frameworks.

But, more importantly, I believe RL fine tuning will enable the model to learn better general approaches to planning and executing steps to complete work. This is Sutton's bitter lesson at work.

For me, desktop automation is the killer app of RL fine tuning, rather than better reasoning in chatbot apps and APIs.

When OpenAI releases their desktop agent capabilities built on this, hopefully in Jan, I think we're going to see another ChatGPT moment.

Even if not, the ability to easily train the system to complete your tasks successfully with full desktop usage is going to be a major unlock for enterprises.

More on RL fine tuning here: https://openai.com/form/rft-research-program/