| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by deepsquirrelnet 240 days ago

I go back and forth on this. A year ago, I was optimistic and I have had 1 case where RL fine tuning a model made sense. But while there are pockets of that, there is a clash with existing industry skills. I work with a lot of machine learning engineers and data scientists and here’s what I observe.

- many, if not most MLEs that got started after LLMs do not generally know anything about machine learning. For lack of clearer industry titles, they are really AI developers or AI devops

- machine learning as a trade is moving toward the same fate as data engineering and analytics. Big companies only want people using platform tools. Some ai products, even in cloud platforms like azure, don’t even give you the evaluation metrics that would be required to properly build ml solutions. Few people seem to have an issue with it.

- fine tuning, especially RL, is packed with nuance and details… lots to monitor, a lot of training signals that need interpretation and data refinement. It’s a much bigger gap than training simpler ML models, which people are also not doing/learning very often.

- The limited number of good use cases means people are not learning those skills from more senior engineers.

- companies have gotten stingy with sme-time and labeling

What confidence do companies have in supporting these solutions in the future? How long will you be around and who will take up the mantle after you leave?

AutoML never really panned out, so I’m less confident that platforming RL will go any better. The unfortunate reality is that companies are almost always willing to pay more for inferior products because it scales. Industry “skills” are mostly experience with proprietary platform products. Sure they might list “pytorch” as a required skill, but 99% of the time, there isn’t hardly anyone at the company that has spent any meaningful time with it. Worse, you can’t use it, because it would be too hard to support.

4 comments

daemonologist 240 days ago

Labels are so essential - even if you're not training anything, being able to quickly and objectively test your system is hugely beneficial - but it's a constant struggle to get them. In the unlikely event you can get budget and priority for an SME to do the work, communicating your requirements to them (the need to apply very consistent rules and make few errors) is difficult and the resulting labels tend to be messy.

More than once I've just done labeling "on my own time" - I don't know the subject as well but I have some idea what makes the neurons happy, and it saves a lot of waiting around.

I've found tuning large models to be consistently difficult to justify. The last few years it seems like you're better off waiting six months for a better foundation model. However, we have a lot of cases where big models are just too expensive and there it can definitely be worthwhile to purpose-train something small.

link

hommes-r 239 days ago

My personal opinion is that true engineering, which revolves around turning complex theory into working practice, has seen a decline in grace. Why spend a lot of time trying to master the art of engineering if you can ride the wave of engineering services and get away with it?

In true hacker spirit, I don't think trying to train a model on a wonky GPU is something that needs an ROI for the individual engineer. It's something they do because they yearn to acquire knowledge.

link

sdenton4 240 days ago

Eventually someone will make a killing on doing actual outcome measurements instead of just trusting the LLMs, Michael Lewis will write a popular book about it, and the cycle will begin anew...

link

XenophileJKO 240 days ago

I'm also seeing teams who expected big gains from fine tuning get incremental or moderate gains. Then they put it in production and regret the action as SOTA marches quickly.

I have avoided fine tuning because the models are currently improving at a rate that exceeds big corporate product development velocity.

link

deepsquirrelnet 240 days ago

Absolutely the first thing you should try is a prompt optimizer. The GEPA optimizer (implemented in DSPy) often outperforms GRPO training[1]. But I think people are usually building with frameworks that aren't machine learning frameworks.

[1] https://arxiv.org/abs/2507.19457

link