| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dr_dshiv 1117 days ago
	If this is true, one should be able to compare with benchmarks or evals to demonstrate this. Anyone know more about this?

1 comments

caddemon 1117 days ago

Yeah I think it's plausible it's gotten worse but it would also be classic human psychology to perceive degradation because you start noticing flaws after the honeymoon effect wore off.

Unfortunately this will be hard to benchmark unless someone was already collecting a lot of data on ChatGPT responses for other purposes. Perhaps if this is happening the degradation will get worse though, so someone noticing it now could start collecting GPT responses longitudinally.

link

boringuser2 1117 days ago

Yes, that's an obvious complication, but it isn't the fault of the humans given that the model can easily be tuned without your knowledge to subjectively perform worse, and there's an obvious incentive for it (compute cost).

link

caddemon 1117 days ago

Yeah I fully agree about compute cost, though I wonder why they don't just introduce another payment tier. If people are really using it at work as much as claimed online, it would be much preferable to be able to pay more for the full original performance, which seems win/win.

link

boringuser2 1117 days ago

Because that involves telling customers that the product they are paying for is no longer available at the price they were paying for it.

Much smoother to simply downgrade the model and claim you're "tuning" if caught.

link

caddemon 1117 days ago

Yeah that makes sense for some products/companies. It just seems short sighted for OpenAI when they could be solidifying a customer base right now. If they actually degrade the product in the name of "tuning" people will just be more inclined to try alternatives like Bard. An enterprise package could've been a good excuse for them to raise prices too.

Maybe their partnership with Microsoft changes the dynamics of how they handle their direct products though.

link

boringuser2 1117 days ago

Bard is garbage even compared to 3.5.

OpenAI doesn't have any competitors, their only weakness that we've seen is their ability to scale their models to meet demand (hence increasingly draconian restrictions in the early days of the ChatGPT-4).

It makes perfect business sense to address your weak points.

link