| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danielmarkbruce 630 days ago

That prompts aren't science means little. If anything it makes them more important because you can't systematically arrive at good ones.

If one spends a lot of time building an application to achieve an actual goal they'll realize the prompts make a gigantic difference and it takes an enormous amount of fiddly, annoying work to improve. I do this (and I built an agent system, which was more straightforward to do...) in financial markets. It so much so that people build systems just to be able to iterate on prompts (https://www.promptlayer.com/).

I may be wrong - but I'll speculate you work on infra and have never had to build a (real) application that is trying to achieve a business outcome. I expect if you did, you'd know how much (non sexy) work is involved on prompting that is hard to replicate.

Hell, papers get published that are just about prompting!

https://arxiv.org/abs/2201.11903

This line of thought effectively led to Gpt-4-o1. Good prompts -> good output -> good training data -> good model.

1 comments

dartos 630 days ago

> If anything it makes them more important because you can't systematically arrive at good ones

Important and easy to make are not the same

I never said prompts didn’t matter, just that they’re so easy to make and so similar to others that they aren’t a moat.

> I may be wrong - but I'll speculate you work on infra and have never had to build a (real) application that is trying to achieve a business outcome.

You’re very wrong. Don’t make assumptions like this. I’ve been a full stack (mostly backend) dev for about 15 years and started working with natural language processing back in 2017 around when word2vec was first published.

Prompts are not difficult, they are time consuming. It’s all trial and error. Data entry is also time consuming, but isn’t difficult and doesn’t provide any moat.

> that is hard to replicate.

Because there are so many factors at play _besides prompting. Prompting is the easiest thing to do in any agent or RAG pipeline. it’s all the other settings and infra that are difficult to tune to replicate a given result. (Good chunking of documents, ensuring only high quality data gets into the system in the first place, etc)

Not to mention needing to know the exact model and seed used.

Nothing on chatgpt is reproducible, for example, simply because they include the timestamp in their system prompt.

> Good prompts -> good output -> good training data -> good model.

This is not correct at all. I’m going to assume you made a mistake since this makes it look like you think that models are trained on their own output, but we know that synthetic datasets make for poor training data. I feel like you should know that.

A good model will give good output. Good output can be directed and refined with good prompting.

It’s not hard to make good prompts, just time consuming.

They provide no moat.

link

danielmarkbruce 630 days ago

There is a lot of nonsense in here, for example:

> but we know that synthetic datasets make for poor training data

This is a silly generalization. Just google "synthetic data for training LLMs" and you'll find a bunch of papers on it. Here's a decent survey: https://arxiv.org/pdf/2404.07503

It's very likely o1 used synthetic data to train the model and/or the reward model they used for RLHF. Why do you think they don't output the chains...? They literally tell you - competitive reasons.

Arxiv is free, pick up some papers. Good deep learning texts are free, pick some up.

link

dartos 629 days ago

Sure, hand wave away my entire comment as “nonsense” and ignore how statistics works.

Training a model on synthetic data (obviously) increases bias present in the initial dataset[1], making for poor training data.

IIRC (this subject is a little fuzzy for me) using synthetic data for RLHF is equivalent to just using dpo, so if they did RLHF it probably wasn’t with synthetic data. They may have gone with dpo, though.

[1] https://arxiv.org/html/2403.07857v1

link

danielmarkbruce 629 days ago

Did you read this paper? No one is suggesting o1 was trained with 100% synthetic or 50% or anything of that nature. Generalizing that "synthetic data is bad" from "training exclusively/majority on synthetic data is bad" is dumb.

Researchers are using synthetic data to train LLMs, especially for fine tuning, and especially instruct fine tuning. You are not up to date with recent work on LLMs.

link

dartos 629 days ago

> No one is suggesting o1 was trained with 100% synthetic or 50% or anything of that nature.

Neither was I.

> "synthetic data is bad“

I never said that… I said that it makes for poor training data, which it does.

> Researchers are using synthetic data to train LLMs, especially for fine tuning, and especially instruct fine tuning

Then those researchers are training with subpar datasets as the bias in that data will be compounded.

It’s a trade off since there’s only so much fresh data in form you want. If they could use entirely non synthetic data, I’m sure they would.

And again, you’re choosing to focus on this one point rather than my main point that prompt provide no moat.

> You are not up to date with recent work on LLMs.

There you go again making assumptions…

I think I’m done with this conversation though.

link

yunwei37 630 days ago

I think actually matters is the "input" and "interact". Prompt is just one of them. The key is you put how you think and how you solve the problem into the it and build a system. Not just computer system, "Multi Agents", "Human Society" are also systems.

link