| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JimDabell 754 days ago

I normally ask about building a multi-tenant system using async SQLAlchemy 2 ORM where some tables are shared between tenants in a global PostgreSQL schema and some are in a per-tenant schema.

Nothing gets it right first time, but when ChatGPT 4 first came out, I could talk to it more and it would eventually get it right. Not long after that though, ChatGPT degraded. It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

Since then benchmarks came out showing that ChatGPT “didn’t really degrade”, but all of the benchmarks seemed focused on single question/answer pairs and not actual multi-turn chat. For this kind of thing, ChatGPT 4 has never managed to recover to as good as it was when it was first released in my experience.

It’s been months since I’ve had to deal with that kind of code, so I might be forgetting something, but I just tried it with Codestral and it spat out something that looked reasonable very quickly on its first try.

2 comments

alephxyz 754 days ago

>It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

That drives me nuts and makes me ragequit about half the time. Although it's usually more effective to go and correct your initial prompt rather than prompt it again

link

checkyoursudo 754 days ago

I had a similar experience. I was trying to get GPT 4 to write some R/Stan code for a bit of bayesian modelling. It would get the model wrong, and then I would walk it through how to do it right, and by the end it would almost get it right, but on the next step, it would be like, oh, this is what you want, and the output was identical to the first wrong attempt, which would start the loop over again.

link

happypumpkin 754 days ago

Similar experience using GPT4 for help with Apple's Accessibility API. I wanted to do some non-happy-path things and it kept looping between solutions that failed to satisfy at least one of a handful of requirements that I had, and in ways that I couldn't combine the different "solutions" to meet all the requirements.

I was eventually able to figure it out with the help of some early 2010s blog posts. Sadly I didn't test giving it that context and having it attempt to find a solution again (and this was before web browsing was integrated with the web app).

More of an issue than it not knowing enough to fulfill my request (it was pretty obscure so I didn't necessarily expect that it would be able to) was that it didn't mind emitting solutions that failed to meet the requirements. "I don't know how to do that" would've been a much preferred answer.

link

kristianp 752 days ago

This seems an important failure mode to me. I too have noticed gpt4 looping between a few different failure cases, in my case it was state transitions in js code. Explaining to it what it did wrong didn't help.

link