Hacker News new | ask | show | jobs
by danielmarkbruce 621 days ago
There is a lot of nonsense in here, for example:

> but we know that synthetic datasets make for poor training data

This is a silly generalization. Just google "synthetic data for training LLMs" and you'll find a bunch of papers on it. Here's a decent survey: https://arxiv.org/pdf/2404.07503

It's very likely o1 used synthetic data to train the model and/or the reward model they used for RLHF. Why do you think they don't output the chains...? They literally tell you - competitive reasons.

Arxiv is free, pick up some papers. Good deep learning texts are free, pick some up.

2 comments

Sure, hand wave away my entire comment as “nonsense” and ignore how statistics works.

Training a model on synthetic data (obviously) increases bias present in the initial dataset[1], making for poor training data.

IIRC (this subject is a little fuzzy for me) using synthetic data for RLHF is equivalent to just using dpo, so if they did RLHF it probably wasn’t with synthetic data. They may have gone with dpo, though.

[1] https://arxiv.org/html/2403.07857v1

Did you read this paper? No one is suggesting o1 was trained with 100% synthetic or 50% or anything of that nature. Generalizing that "synthetic data is bad" from "training exclusively/majority on synthetic data is bad" is dumb.

Researchers are using synthetic data to train LLMs, especially for fine tuning, and especially instruct fine tuning. You are not up to date with recent work on LLMs.

> No one is suggesting o1 was trained with 100% synthetic or 50% or anything of that nature.

Neither was I.

> "synthetic data is bad“

I never said that… I said that it makes for poor training data, which it does.

> Researchers are using synthetic data to train LLMs, especially for fine tuning, and especially instruct fine tuning

Then those researchers are training with subpar datasets as the bias in that data will be compounded.

It’s a trade off since there’s only so much fresh data in form you want. If they could use entirely non synthetic data, I’m sure they would.

And again, you’re choosing to focus on this one point rather than my main point that prompt provide no moat.

> You are not up to date with recent work on LLMs.

There you go again making assumptions…

I think I’m done with this conversation though.

I think actually matters is the "input" and "interact". Prompt is just one of them. The key is you put how you think and how you solve the problem into the it and build a system. Not just computer system, "Multi Agents", "Human Society" are also systems.