Hacker News new | ask | show | jobs
by bbotond 1117 days ago
Yes. Before the update, when its avatar was still black, it solved pretty complex coding problems effortlessly and gave very nuanced, thoughtful answers to non-programming questions. Now it struggles with just changing two lines in a 10-line block of CSS and printing this modified 10-line block again. Some lines are missing, others are completely different for no reason. I'm sure scaling the model is hard, but they lobotomized it in the process.

The original GPT-4 felt like magic to me, I had this sense of awe while interacting with it. Now it is just a dumb stochastic parrot.

8 comments

"The original GPT-4 felt like magic to me"

You never had access to that original. Watch this talk by one of the people that integrated GPT-4 in Bing telling how they noticed GPT-4 releases they got from OpenAI got iteratively and significantly nerfed even during the project.

https://www.youtube.com/watch?v=qbIk7-JPB2c

“You never had access to that original.”

While your overall point is well taken, GP is clearly referring to the original public release of GPT-4 on March 14.

Yes, that was how I read it as well. I was just pointing out that the public release was already extremely nerfed from what was available pre-launch.
Interesting, please expound since very few of us had access pre-launch.
The video I posted referenced this.

In summary: The person had access to early releases through his work at Microsoft Research where they were integrating GPT-4 into Bing. He used "Draw a unicorn in TikZ" (TikZ is probably the most complex and powerful tool to create graphic elements in LaTeX) as a prompt and noticed how the model's responses changed with each release they got from OpenAI. While at first the drawings got better and better, once OpenAI started focusing on "safety" subsequent releases got worse and worse at the task.

That indicates the “nerfing” is not what I would think (a final pass to remove badthink) but somehow deep in everything, because the question asked should be orthogonal.
I experienced the same thing as a user of the public service. The system could at one point draw something approximating a unicorn in tikz. Now, its renditions are extremely weak, to the point of barely resembling any four-legged animal.
That’s awful. Talk about cutting off your nose to spite your face.
Here's another interview from a guy who had access to the unfiltered GPT-4 before its release. He says it was extremely powerful and would answer any question whatsoever without hesitating.

https://www.youtube.com/watch?v=oLiheMQayNE&t=2849s

Wow, I could only watch the first 15 minutes now but it’s already fascinating! Thanks for the recommendation.
This is for your protection from an extinction level event. Without nerfing the current model they couldn’t charge enterprise level fee structures for access to the superior models, thus ensuring the children are safe from scary AI. Tell your congress person we need to grant Microsoft and Google exclusive monopolies on AI research to protect us from open source and competitor AI models that might erode their margins and lead to the death of all life without their corporate stewardship. Click accept for your safety.
This but unironically.
Try out Bard, it's coding is much improved in the last 2 weeks. I've unfortunately switched over for the time being.
I just tried Bard based on this comment, and it's really, really bad.

Can you please help me with how you are prompting it?

If you have to worry about prompting, it already tells you everything one needs to know about how good the model is.
I don't think that's true at all. Think of it like setting up conversation constraints to reduce the potential pitfalls for a model. You can vastly improve the capability of just about any LLM I've used by being clear about what you specifically want considered, and what you don't want considered when solving a problem.

It'll take you much farther, by allowing you to incrementally solve your problem in smaller steps while giving the model the proper context required for each step of the problem-solving process, and limiting the things it must consider for each branch of your problem.

I’ve been seeing similar comments about Bard all over Twitter and social media.

My testing agrees with yours. Almost seems like a sponsored marketing campaign with no truth to it.

After my first day with Bard, I would have agreed with you. But since then, I've found that Bard simply has a lot of variance in answer quality. Sometimes it fails for surprisingly simple questions, or hallucinates to an even worse degree than ChatGPT, but other times it gives much better answers than ChatGPT.

On the first day, it felt like 80% of the responses were in the first (fail/hallucinate) category, but over time it feels more like a 50/50 split, which makes it worth running prompts over both ChatGPT and Bard and select the best one. I don't know if the change is because I learnt to prompt it better, or if they improved the models based on all the user chats from the public release - perhaps both.

If it needs to write a code, I usually prompt it with something like:

"write me a script in python3 that uses selenium to log into a MyBB forum"

note: usually it will not compile and you still have to do some editing

Don't know what you are doing? But Bard is so much faster than openai and its answers are clearer and more succint.
This is just... false. Bard is not just a little worse than gpt-4 for coding, it's more like several orders of magnitude worse. I can't imagine how you are getting superior outputs from Bard.
Can you give an example of a prompt and the output for each that you find Bard to be better for?
I'd be surprised if he can. Both accounts that are purporting how useful Bard is (okdood64, pverghese) have comment histories defending or advocating for Google frequently:

Examples:

https://news.ycombinator.com/item?id=35224167#35227068

https://news.ycombinator.com/item?id=35303210#35360467

“Bard isn’t currently supported in your country. Stay tuned!”
The Bard model (Bison) is available without region lock as part of Google Cloud Platform. In addition to being able to call it via an API, they have a similar developer UI to the OpenAI playground to interactively experiment with it.

https://console.cloud.google.com/vertex-ai/generative/langua...

it's also really, really bad and fails compared to even open source models right now.
God, what happened to Google. What a fall from grace.

Alpaca is pretty good though.

They have 100,000 employees pretending to work on the past.

They have no leadership at the top. Nobody that can steer the ship to the next land (or even anybody that has a map). Who is actively working at Alphabet that has the authority to kill Google search through self-cannibalization? Absolutely nobody. They're screwed accordingly. It takes an enormous level of authority (think: Steve Jobs) and leadership to even considering intentionally putting at risk a $200 billion sales product. The trick of course is that it's already at great risk.

They don't know what to do, so they're particularly reactive. It has been that way for a long time though, it's just that Google search was never under serious threat previously, so it didn't really matter as a terminal risk if they failed (eg with their social network efforts; their social networks were reactive).

It's somewhat similar to watching Microsoft under Ballmer and how they lacked direction, didn't know what to do, and were too reactive. You can tell when a giant entity like Google is wandering aimlessly.

Did they release the Codey or Unicorn models publicly yet? Or say when they might do that?
Is that free or do you have to pay?

Also do you need to change the options like Token Limit etc?

It's completely free. No tokens nothing.
But it can't be used unless I enable billing, which I am not willing to do after reading all the horror stories about people getting billed thousands overnight. I'm not willing to take the risk that I forget some script and it keeps creating charges.
Thank you!
Google's passion for region locking is insane to me
Its a legal thing, not something they want to do
What law prohibits Google from making Bard available outside the USA?
It's available here in the UK, so it's not USA exclusive.
It's blocked in the EU because they don't want to/can't comply with GDPR.
Eh, more like limiting rollout because they can't/don't want to handle the scale.
Same for me, I’m in Estonia :(
You can use a VPN to use an American connection, it doesn't matter where your Google account is registered.
Not necessarily American, you just have to avoid EU and, I believe, Russia/China/Cuba etc.
I'm in Switzerland and Bard is locked out, we do not go by EU laws because we are not part of the EU. We have plenty of bilateral deals but still.
Thanks, I’ll try it! (I’m in Hungary)
Google (Deepmind) actually has the people and has developed the science to make the best AI products in the world, but unfortunately Bard seems to be thrown together in an afternoon by an intern, and then handed off to a hoard of marketing people. It's not good right now. Deepmind is one of the best scientifically, they just don't really make products. OpenAI is essentially the direct opposite of that.
No thanks! I have better things to do than feeding that advertising behemoth. What I like about ChatGPT is that I don't see any ads at all!
That you know of.

Don't you worry, if there is any medium, place or mode of interaction people spend time on, advertising will eventually metastasize to it, and will keep growing until it completely devalues the activity and destroys most of the utility it provides.

> What I like about ChatGPT is that I don't see any ads at all!

For now. It's just a marketing tool/demo site, like ITA Matrix was/is. The ads are vended by Bing.

I asked it to review some code a couple days ago - the comments while valid english were nonsense
It’s go-to tactic now if I ask it to go over any piece of code is to give a generic overview. Earlier, it would section out the code into chunks and go through each one individually.
Yeah, the bing integration did not go well. Went from amazing to annoying.
Aren’t the original weights around somewhere?
Same happened with Dalle-2. It went downhill after a couple of weeks.
No wonder, is this just the chat interface or the API too? I guess gpt4 was never sustainable at $20 a month. Annoying to be charged the same subscription and the product made inferior.
For enterprise pricing, please contact our sales team today!
I wonder what the unfilitered one is like.

Are they sitting on a near-perfect arbiter of truth? That would be worth hiding.

No.
I just tried a comparison of ChatGPT, Claude and Bard to write a python function I needed for work and ChatGPT (using GPT-4) whined and moaned about what a gargantuan task it was and then did the wrong thing. Claude and Bard gave me what I expected.
If this is true, one should be able to compare with benchmarks or evals to demonstrate this.

Anyone know more about this?

Yeah I think it's plausible it's gotten worse but it would also be classic human psychology to perceive degradation because you start noticing flaws after the honeymoon effect wore off.

Unfortunately this will be hard to benchmark unless someone was already collecting a lot of data on ChatGPT responses for other purposes. Perhaps if this is happening the degradation will get worse though, so someone noticing it now could start collecting GPT responses longitudinally.

Yes, that's an obvious complication, but it isn't the fault of the humans given that the model can easily be tuned without your knowledge to subjectively perform worse, and there's an obvious incentive for it (compute cost).
Yeah I fully agree about compute cost, though I wonder why they don't just introduce another payment tier. If people are really using it at work as much as claimed online, it would be much preferable to be able to pay more for the full original performance, which seems win/win.
Because that involves telling customers that the product they are paying for is no longer available at the price they were paying for it.

Much smoother to simply downgrade the model and claim you're "tuning" if caught.

Yeah that makes sense for some products/companies. It just seems short sighted for OpenAI when they could be solidifying a customer base right now. If they actually degrade the product in the name of "tuning" people will just be more inclined to try alternatives like Bard. An enterprise package could've been a good excuse for them to raise prices too.

Maybe their partnership with Microsoft changes the dynamics of how they handle their direct products though.