| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bbotond 1117 days ago
	Yes. Before the update, when its avatar was still black, it solved pretty complex coding problems effortlessly and gave very nuanced, thoughtful answers to non-programming questions. Now it struggles with just changing two lines in a 10-line block of CSS and printing this modified 10-line block again. Some lines are missing, others are completely different for no reason. I'm sure scaling the model is hard, but they lobotomized it in the process. The original GPT-4 felt like magic to me, I had this sense of awe while interacting with it. Now it is just a dumb stochastic parrot.

8 comments

PeterStuer 1117 days ago

"The original GPT-4 felt like magic to me"

You never had access to that original. Watch this talk by one of the people that integrated GPT-4 in Bing telling how they noticed GPT-4 releases they got from OpenAI got iteratively and significantly nerfed even during the project.

https://www.youtube.com/watch?v=qbIk7-JPB2c

link

bumbledraven 1117 days ago

“You never had access to that original.”

While your overall point is well taken, GP is clearly referring to the original public release of GPT-4 on March 14.

link

PeterStuer 1117 days ago

Yes, that was how I read it as well. I was just pointing out that the public release was already extremely nerfed from what was available pre-launch.

link

avocade 1117 days ago

Interesting, please expound since very few of us had access pre-launch.

link

PeterStuer 1117 days ago

The video I posted referenced this.

In summary: The person had access to early releases through his work at Microsoft Research where they were integrating GPT-4 into Bing. He used "Draw a unicorn in TikZ" (TikZ is probably the most complex and powerful tool to create graphic elements in LaTeX) as a prompt and noticed how the model's responses changed with each release they got from OpenAI. While at first the drawings got better and better, once OpenAI started focusing on "safety" subsequent releases got worse and worse at the task.

link

bombcar 1117 days ago

That indicates the “nerfing” is not what I would think (a final pass to remove badthink) but somehow deep in everything, because the question asked should be orthogonal.

link

inciampati 1115 days ago

I experienced the same thing as a user of the public service. The system could at one point draw something approximating a unicorn in tikz. Now, its renditions are extremely weak, to the point of barely resembling any four-legged animal.

link

pmarreck 1116 days ago

That’s awful. Talk about cutting off your nose to spite your face.

link

015UUZn8aEvW 1117 days ago

Here's another interview from a guy who had access to the unfiltered GPT-4 before its release. He says it was extremely powerful and would answer any question whatsoever without hesitating.

https://www.youtube.com/watch?v=oLiheMQayNE&t=2849s

link

bbotond 1117 days ago

Wow, I could only watch the first 15 minutes now but it’s already fascinating! Thanks for the recommendation.

link

fnordpiglet 1117 days ago

This is for your protection from an extinction level event. Without nerfing the current model they couldn’t charge enterprise level fee structures for access to the superior models, thus ensuring the children are safe from scary AI. Tell your congress person we need to grant Microsoft and Google exclusive monopolies on AI research to protect us from open source and competitor AI models that might erode their margins and lead to the death of all life without their corporate stewardship. Click accept for your safety.

link

FeepingCreature 1112 days ago

This but unironically.

link

okdood64 1117 days ago

Try out Bard, it's coding is much improved in the last 2 weeks. I've unfortunately switched over for the time being.

link

AndyNemmity 1117 days ago

I just tried Bard based on this comment, and it's really, really bad.

Can you please help me with how you are prompting it?

link

moffkalast 1117 days ago

If you have to worry about prompting, it already tells you everything one needs to know about how good the model is.

link

Tostino 1116 days ago

I don't think that's true at all. Think of it like setting up conversation constraints to reduce the potential pitfalls for a model. You can vastly improve the capability of just about any LLM I've used by being clear about what you specifically want considered, and what you don't want considered when solving a problem.

It'll take you much farther, by allowing you to incrementally solve your problem in smaller steps while giving the model the proper context required for each step of the problem-solving process, and limiting the things it must consider for each branch of your problem.

link

300bps 1117 days ago

I’ve been seeing similar comments about Bard all over Twitter and social media.

My testing agrees with yours. Almost seems like a sponsored marketing campaign with no truth to it.

link

sundarurfriend 1116 days ago

After my first day with Bard, I would have agreed with you. But since then, I've found that Bard simply has a lot of variance in answer quality. Sometimes it fails for surprisingly simple questions, or hallucinates to an even worse degree than ChatGPT, but other times it gives much better answers than ChatGPT.

On the first day, it felt like 80% of the responses were in the first (fail/hallucinate) category, but over time it feels more like a 50/50 split, which makes it worth running prompts over both ChatGPT and Bard and select the best one. I don't know if the change is because I learnt to prompt it better, or if they improved the models based on all the user chats from the public release - perhaps both.

link

m4jor 1117 days ago

If it needs to write a code, I usually prompt it with something like:

"write me a script in python3 that uses selenium to log into a MyBB forum"

note: usually it will not compile and you still have to do some editing

link

pverghese 1117 days ago

Don't know what you are doing? But Bard is so much faster than openai and its answers are clearer and more succint.

link

minihat 1117 days ago

This is just... false. Bard is not just a little worse than gpt-4 for coding, it's more like several orders of magnitude worse. I can't imagine how you are getting superior outputs from Bard.

link

BeefySwain 1117 days ago

Can you give an example of a prompt and the output for each that you find Bard to be better for?

link

300bps 1117 days ago

I'd be surprised if he can. Both accounts that are purporting how useful Bard is (okdood64, pverghese) have comment histories defending or advocating for Google frequently:

Examples:

https://news.ycombinator.com/item?id=35224167#35227068

https://news.ycombinator.com/item?id=35303210#35360467

link

bbotond 1117 days ago

“Bard isn’t currently supported in your country. Stay tuned!”

link

scottfr 1117 days ago

The Bard model (Bison) is available without region lock as part of Google Cloud Platform. In addition to being able to call it via an API, they have a similar developer UI to the OpenAI playground to interactively experiment with it.

https://console.cloud.google.com/vertex-ai/generative/langua...

link

technics256 1117 days ago

it's also really, really bad and fails compared to even open source models right now.

link

local_crmdgeon 1117 days ago

God, what happened to Google. What a fall from grace.

Alpaca is pretty good though.

link

adventured 1117 days ago

They have 100,000 employees pretending to work on the past.

They have no leadership at the top. Nobody that can steer the ship to the next land (or even anybody that has a map). Who is actively working at Alphabet that has the authority to kill Google search through self-cannibalization? Absolutely nobody. They're screwed accordingly. It takes an enormous level of authority (think: Steve Jobs) and leadership to even considering intentionally putting at risk a $200 billion sales product. The trick of course is that it's already at great risk.

They don't know what to do, so they're particularly reactive. It has been that way for a long time though, it's just that Google search was never under serious threat previously, so it didn't really matter as a terminal risk if they failed (eg with their social network efforts; their social networks were reactive).

It's somewhat similar to watching Microsoft under Ballmer and how they lacked direction, didn't know what to do, and were too reactive. You can tell when a giant entity like Google is wandering aimlessly.

link

ilaksh 1117 days ago

Did they release the Codey or Unicorn models publicly yet? Or say when they might do that?

link

sumedh 1117 days ago

Is that free or do you have to pay?

Also do you need to change the options like Token Limit etc?

link

pverghese 1117 days ago

It's completely free. No tokens nothing.

link

Gasp0de 1117 days ago

But it can't be used unless I enable billing, which I am not willing to do after reading all the horror stories about people getting billed thousands overnight. I'm not willing to take the risk that I forget some script and it keeps creating charges.

Thank you!

Google's passion for region locking is insane to me

link

bwb 1117 days ago

Its a legal thing, not something they want to do

link

EForEndeavour 1117 days ago

What law prohibits Google from making Bard available outside the USA?

link

gambiting 1117 days ago

It's available here in the UK, so it's not USA exclusive.

link

simse 1117 days ago

It's blocked in the EU because they don't want to/can't comply with GDPR.

link

underdeserver 1117 days ago

Eh, more like limiting rollout because they can't/don't want to handle the scale.

link

sintezcs 1117 days ago

Same for me, I’m in Estonia :(

link

corgihamlet 1117 days ago

You can use a VPN to use an American connection, it doesn't matter where your Google account is registered.

link

airgapstopgap 1117 days ago

Not necessarily American, you just have to avoid EU and, I believe, Russia/China/Cuba etc.

link

column 1117 days ago

I'm in Switzerland and Bard is locked out, we do not go by EU laws because we are not part of the EU. We have plenty of bilateral deals but still.

link

bbotond 1117 days ago

Thanks, I’ll try it! (I’m in Hungary)

link

chaxor 1117 days ago

Google (Deepmind) actually has the people and has developed the science to make the best AI products in the world, but unfortunately Bard seems to be thrown together in an afternoon by an intern, and then handed off to a hoard of marketing people. It's not good right now. Deepmind is one of the best scientifically, they just don't really make products. OpenAI is essentially the direct opposite of that.

link

qwepjn2oi3j 1117 days ago

No thanks! I have better things to do than feeding that advertising behemoth. What I like about ChatGPT is that I don't see any ads at all!

link

TeMPOraL 1117 days ago

That you know of.

Don't you worry, if there is any medium, place or mode of interaction people spend time on, advertising will eventually metastasize to it, and will keep growing until it completely devalues the activity and destroys most of the utility it provides.

link

arcticbull 1117 days ago

> What I like about ChatGPT is that I don't see any ads at all!

For now. It's just a marketing tool/demo site, like ITA Matrix was/is. The ads are vended by Bing.

link

ape4 1117 days ago

I asked it to review some code a couple days ago - the comments while valid english were nonsense

link

spaceman_2020 1117 days ago

It’s go-to tactic now if I ask it to go over any piece of code is to give a generic overview. Earlier, it would section out the code into chunks and go through each one individually.

link

datavirtue 1117 days ago

Yeah, the bing integration did not go well. Went from amazing to annoying.

link

EGreg 1117 days ago

Aren’t the original weights around somewhere?

link

anibalin 1117 days ago

Same happened with Dalle-2. It went downhill after a couple of weeks.

link

Rastonbury 1117 days ago

No wonder, is this just the chat interface or the API too? I guess gpt4 was never sustainable at $20 a month. Annoying to be charged the same subscription and the product made inferior.

link

berniedurfee 1117 days ago

For enterprise pricing, please contact our sales team today!

link

local_crmdgeon 1117 days ago

I wonder what the unfilitered one is like.

Are they sitting on a near-perfect arbiter of truth? That would be worth hiding.

link

smolder 1116 days ago

No.

link

jonathan-kosgei 1115 days ago

I just tried a comparison of ChatGPT, Claude and Bard to write a python function I needed for work and ChatGPT (using GPT-4) whined and moaned about what a gargantuan task it was and then did the wrong thing. Claude and Bard gave me what I expected.

link

dr_dshiv 1117 days ago

If this is true, one should be able to compare with benchmarks or evals to demonstrate this.

Anyone know more about this?

link

caddemon 1117 days ago

Yeah I think it's plausible it's gotten worse but it would also be classic human psychology to perceive degradation because you start noticing flaws after the honeymoon effect wore off.

Unfortunately this will be hard to benchmark unless someone was already collecting a lot of data on ChatGPT responses for other purposes. Perhaps if this is happening the degradation will get worse though, so someone noticing it now could start collecting GPT responses longitudinally.

link

boringuser2 1117 days ago

Yes, that's an obvious complication, but it isn't the fault of the humans given that the model can easily be tuned without your knowledge to subjectively perform worse, and there's an obvious incentive for it (compute cost).

link

caddemon 1117 days ago

Yeah I fully agree about compute cost, though I wonder why they don't just introduce another payment tier. If people are really using it at work as much as claimed online, it would be much preferable to be able to pay more for the full original performance, which seems win/win.

link

boringuser2 1117 days ago

Because that involves telling customers that the product they are paying for is no longer available at the price they were paying for it.

Much smoother to simply downgrade the model and claim you're "tuning" if caught.

link

caddemon 1117 days ago

Yeah that makes sense for some products/companies. It just seems short sighted for OpenAI when they could be solidifying a customer base right now. If they actually degrade the product in the name of "tuning" people will just be more inclined to try alternatives like Bard. An enterprise package could've been a good excuse for them to raise prices too.

Maybe their partnership with Microsoft changes the dynamics of how they handle their direct products though.

link