| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trjordan 2 hours ago

This is RL, right? Like, this is exactly why models have mostly converged around obvious style, because we train them literally on thumbs-up/thumbs-down data of what good behavior and good code looks like.

And that's why it's so hard to get a model to reproduce the specific taste of a person or an organization. My taste is different than yours, so if we dump our aggregate preferences into RL, in averages out to nothing interesting.

For the code-writing case, this means you end up reviewing every line of code, looking for places where you'd thumbs-down the code. Not every line of code contains a real decision, though, so it feels like a waste of time.

3 comments

andy99 59 minutes ago

It’s supervised learning rather than RL, you’re just training to labels. It doesn’t work (doesn’t generalize) because there is no guarantee or even expectation that any causal relationship is learned, it’s just whatever convenient pattern gets the lowest loss. There is lots of research on this for those unaware.

link

eithed 1 hour ago

Yes and no.

If I were to ask you - what convention you want to follow for your database columns - camelcase or snakecase? There's no correct global answer. There's no overarching truth that should apply to all databases in existence (even if you'll focus on a certain type of database). Hence the no.

But yes, because in the context of existing system there is a convention. If it's snakecase, you create new tables with snakecase column names.

LLMs will generally follow conventions, but sometimes they will not, because indeed - global truths (or at least, the "last article it read" truths) sometimes win over (I assume)

link

paytonjjones 2 hours ago

This is, in short, the big current problem with AI.

LLMs are built for scale so they've given up on the kind of online learning / "long term memory" processes that would individualize them.

The LLM is permanently locked to being a really cracked engineer on their first day at your company, looking at your codebase for the first time.

You can scaffold a bit with .md files, but at the moment they lack the ability to do what humans do: go to sleep, encode things from short to long term memory, and wake up the next day with more specific knowledge baked in.

link

trjordan 2 hours ago

100%. The problem with them isn't making sure they're doing the right thing, it's making sure they're not making bad assumptions.

IMHO this is where code review goes until we fix the individualized model thing: you need to review the decisions the agent made, where you didn't steer. Most will be right. A few will be disastrously wrong. But decision-by-decision is a lot less to review than line-by-line of code.

link

pixl97 57 minutes ago

Yea, individual learning is super expensive at this point and scale is the only way for paying for training at this point. Maybe at some point in the future we'll get this.

link

plastic-enjoyer 2 hours ago

> LLMs are built for scale so they've given up on the kind of online learning / "long term memory" processes that would individualize them.

I wonder if this is even desirable from a product perspective. You probably don't want online learning in a product that you are selling because you can't guarantee a consistent quality of the product.

link

paytonjjones 1 hour ago

You could say the same thing about employees!

And to be fair, the ability to fire employees and hire new ones is pretty important for that reason. In cases where you can't easily fire employees (e.g. unions), you encounter the very problem you're describing, and it often leads to companies preferring more consistent automations.

link