| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CSSer 372 days ago
	It doesn't help that thanks to RLHF, every time a good example of this gains popularity, e.g. "How many Rs are in 'strawberry'?", it's often snuffed out quickly. If I worked at a company with an LLM product, I'd build tooling to look for these kinds of examples in social media or directly in usage data so they can be prioritized for fixes. I don't know how to feel about this. On the one hand, it's sort of like red teaming. On the other hand, it clearly gives consumers a false sense of ability.

1 comments

spion 372 days ago

Indeed. Which is why I think the only way to really evaluate the progress of LLMs is to curate your own personal set of example failures that you don't share with anyone else and only use it via APIs that provide some sort of no-data-retention and no-training guarantees.

link