| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sebgr 982 days ago
	For coding it is still 10x worse than gpt4. I asked it to write a simple database sync function and it gives me tons of pseudocode like `//sync object with best practices`. When I ask it to give me real code it forgets tons of key aspects.

7 comments

swatcoder 982 days ago

Because they're ultimately training data simulators and not actually brilliant aritifical programmers, we can expect Microsoft-affiliated models like ChatGPT4 and beyond to have much stronger value for coding because they have unmediated access to GitHub content.

So it's most useful to look at other capabilities and opportunities when evaluating LLM's with a different heritage.

Not to say we shouldn't evaluate this one for coding or report our evaluations, but we shouldn't be surprised that it's not leading the pack on that particular use case.

YetAnotherNick 982 days ago

Github full (public) scrape is available to anyone. GPT-4 was trained before Microsoft deal so I don't think it is because of Github access. And GPT-4 is significantly better in everything compared to second best model for that field, not just coding.

avita1 982 days ago

Is this practically true? Yes, anyone can clone any repo from Github, but surely scraping all of Github would run into rate limits?

The terms and conditions say as much https://docs.github.com/en/site-policy/github-terms/github-t...

vineyardmike 982 days ago

Well today you get to learn about the GitHub Archive project, which creates dumps of all GitHub data.

One example is the data hosted in Google Cloud.

https://cloud.google.com/blog/topics/public-datasets/github-...

threeseed 982 days ago

And there is no evidence that Github is violating any open source licenses.

So they are going to be training on exactly the same data that is available to all.

whimsicalism 982 days ago

idk we're just "have more kids" simulators and we do pretty good at programming as a side-task

swatcoder 982 days ago

Sure, and those of us who have more robust preparation and expoure generally do a better job of it.

preommr 982 days ago

Someone doesn't get good at programming with low quality learning sources. Also, a poor comparison because models are not people - might as well complain about how NPCs in games behave because they fail at problems real people can solve.

whimsicalism 982 days ago

We are both substrate that has been aggressively optimized for a task with a lot of side benefits. "NPC"s are not optimized at all, they are coded using symbolic rules/deterministic behavior.

ironrabbit 982 days ago

Zero chance private github repos make it into openai training data, can you imagine the shitshow if GPT-4 started regurgitating your org's internal codebase?

nomel 982 days ago

Org specific AI is, almost certainly, the killer app. This will have to be possible at some point, or OpenAI will be left in the dust.

whimsicalism 982 days ago

You are downvoted but I agree.

diplodinkus 982 days ago

Agreed, but I do find gpt4 has been increasing the amount of pseudo code recently. I think they are a/b testing me. I find myself asking if how much energy it wasted giving me replies that I then have to tell it to fix.. Which is of course a silly thing to do, but maybe someone at oAI is listening?

FrenchDevRemote 982 days ago

If you mean through the user friendly chat GPT website, they're probably making it output as few tokens as possible to cut costs

FrustratedMonky 982 days ago

That can't be, because I can ask it a simple question that an answer is maybe 1 sentence, and it repeats the question then provides a whole novel. So ton of tokens.

madeofpalk 982 days ago

GPT still writes like a highschooler trying to hit a high word count :(

droopyEyelids 982 days ago

Like a content mill trying to keep you on the page for as long as possible! Which it was trained on.

gtirloni 982 days ago

You can ask it to be very concise.

I added it to my custom instructions and it has helped a lot.

gumballindie 982 days ago

Wow, imagine paying so they can experiment on you and limit what you get. I so wish i found such … useful clients for my own projects.

FrenchDevRemote 982 days ago

It's not experimentation, it's probably one of the only things that allowed them to make gpt 3.5 turbo 10 TIMES cheaper than the previous model.

wouldbecouldbe 982 days ago

Yeah but to be honest been a pain last days to get gpt 4 to write full pieces of code for more the 10-15 lines. Have to re-ask many times and at some point it forgets my initial specifications.

s1gnp0st 982 days ago

Earlier in the year I had ChatGPT 4 write a large, complicated C program. It did so remarkably well, and most of the code worked without further tweaking.

Today I have the same experience. The thing fills in placeholder comments to skip over more difficult regions of the code, and routinely forgets what we were doing.

Aside all the recent OpenAI drama, I've been displeased as a paying customer that their products routinely make their debut at a much higher level of performance than when they've been in production for a while.

One would expect the opposite unless they're doing a bad job planning capacity. I'm not diminishing the difficulty of what they're doing; nevertheless, from a product perspective this is being handled poorly.

parkerrex 982 days ago

Definitely degraded. I recommend being more specific in your prompting. Also if you have threads with a ton of content, they will get slow as molasses. It sucks but giving them a fresh context each day is helpful. I create text expanders for common prompts / resetting context.

eg: Write clean {your_language} code. Include {whatever_you_use} conventions to make the code readable. Do not reply until you have thought out how to implement all of this from a code-writing perspective. Do not include `/..../` or any filler commentary implying that further functionality needs to be written. Be decisive and create code that can run, instead of writing placeholders. Don't be afraid to write hundreds of lines of code. Include file names. Do not reply unless it's a full-fledged production ready code file.

zarzavat 982 days ago

These models are black boxes with unlabeled knobs. A change that makes things better for one user might make things worse for another user. It is not necessarily the case that just because it got worse for you that it got worse on average.

Also, the only way for OpenAI to really know if a model is an improvement or not is to test it out on some human guinea pigs.

eyegor 982 days ago

My understanding is they reduced the number of ensembles feeding gpt4 so they could support more customers. I want to say they cut it from 16 to 8. Take that with a grain of salt, that comes through the rumor telephone.

Are you prompting it with instructions about how it should behave at the start of a chat, or just using the defaults? You can get better results by starting a chat with "you are an expert X developer, with experience in xyz and write full and complete programs" and tweak as needed.

s1gnp0st 982 days ago

Yep, I'm still able to contort prompts to achieve something usable; however, I didn't have to do that at the beginning, and I'd rather pay $100/mo to not have to do so now.

CSMastermind 982 days ago

Agreed OpenAI products have a history of degrading in quality over time.

sp332 982 days ago

OpenAI just had to pause signups after demo day because of capacity issues. They also switched to making users pay in advance for usage instead of billing them after.

refulgentis 982 days ago

They aren't switching anything with payments. Bad rumor amplified by social contagion and a 100K:1 ratio of people talking about it to people building with it.

hansvm 981 days ago

They told me they were switching and haven't sent anything since to the contrary.

vanviegen 982 days ago

Could the (perceived) drop in quality be due to ChatGPT switching from GPT-4 to GPT-4-turbo?

wouldbecouldbe 982 days ago

Im not really sure what chatgpt+ is serving me. There was a moment it was suddenly blazing fast, that was around the time turbo came out. Off late, it's been either super slow or super fast randomly.

nomel 982 days ago

Try using the playground, with a more code specific system prompt, or even put key points/the whole thing into the system prompt. I see better performance, compared to the web.

nmfisher 982 days ago

This was one of the main reasons I cancelled my ChatGPT Pro subscription in favour of Claude…but unfortunately Claude is now doing the same thing too.

nafizh 982 days ago

This has exactly been my experience for at least the last 3 months. At this point, I am thinking if paying that 20 bucks is even worth anymore which is a shame because when gpt-4 first came out, it was remembering everything in a long conversation and self-correcting itself based on modifications.

hobo_mark 982 days ago

Since I do not use it every day, I only pay for API access directly and it costs me a fraction of that. You can trivially make your own ChatGPT frontend (and from what people write you could make GPT write most of the code, although it's never been my experience).

mercer 982 days ago

same. what would you use as an alternative?

ren_engineer 982 days ago

definitely noticed it being "lazy" in the sense it will give the outline for code and then literally put in comments telling me to fill out the rest, basically pseudocode. Have to assume they are trying to save on token output to reduce resources used when they can get away with it

squeaky-clean 982 days ago

Even when I literally ask it for code it will often not give me code and will give me a high level overview or pseudocode until I ask it again for actual code.

It's pretty funny that my second message is often "that doesn't look like any programming language I recognize. I tried running it in Python and got lots of errors".

"My apologies, that message was an explanation of how to solve your problem, not code. I'll provide a concrete example in Python."

charlesischuck 982 days ago

You should read how the infrastructure of gpt works. In peak times you response quality will drop. Microsoft has a few whitepapers on it.

Ideal output is when nobody elese is using the tool.

taf2 982 days ago

noticing the same - what about with gpt-4 via api?

johnisgood 982 days ago

I had one chat with ChatGPT 3.5 where it would tell me the correct options (switches) to a command, and then a couple weeks later it is telling me this (in the same chat FWIW):

> As of my last knowledge update in September 2021, the XY framework did not have a --abc or --bca option in its default project generator.

Huh...

inciampati 982 days ago

Except: you can feed it an entire programming language manual, all the docs for all the modules you want to use, and _then_ it's stunningly good, whipping chatgpt4 that same 10x.

michaelt 982 days ago

I gather the pricing is $8 for a million input tokens [1] so if your language's manual is the size of a typical paperback novel, that'd be about $0.8 per question. And presumably you get to pay that if you ask any follow-up questions too.

Sounds like a kinda expensive way of doing things, to me.

[1] https://www-files.anthropic.com/production/images/model_pric...

infecto 982 days ago

From my perspective it sounds pretty cheap if we get to the answers immediately.

esafak 982 days ago

Have you tried it? GPT4 fails as often as it succeeds at coding questions I ask so I'm not going to shell out that kind of money to take my chances.

infecto 982 days ago

Claude? No, have requested access many times but radio silence.

OpenAI? I use ChatGPT A LOT for coding as some mixture of pair programmer and boilerplate, works generally well for me. On the API side use it heavily for other work and its more directed and have a very high acceptance rate.

cowthulhu 982 days ago

If you need a lot of revisions/tweaks, the price could be pretty prohibitive.

FrustratedMonky 982 days ago

Can you just tell it to focus on a particular language and have it go find the manuals? If it is so easy to add manuals, maybe they should just make options to do that for you.

chubot 982 days ago

How do you do this? Links / more info?

davedx 982 days ago

I honestly don’t have time for that level of prompt engineering. So, chatGPT wins (for me)

roflyear 982 days ago

Right "may as well do it myself" - I think this is the natural limit these things will reach. Just my opinion.

machiaweliczny 982 days ago

Yeah but if their model would be accessible it would already have good vscode extension

p1esk 982 days ago

Gpt4 has 128k context length now.

whimsicalism 982 days ago

gpt4 turbo

vasili111 982 days ago

Am I only one that thinks that Claude 2 is not bad for programming questions? I do not think it is best one for programming questions but I do not think that it is bad too. I have received multiple times very good response from Claude 2 on Python and SQL.

dinvlad 982 days ago

I find all of them, gpt4 or not, just suck, plain and simple. They are only good for only the most trivial stuff, but any time the complexity rises even a little bit they all start hallucinate wildly and it becomes very clear they're nothing more than just word salad generators.

charlesischuck 982 days ago

I have built large scale distributed gpu (96gpus per job) dnn systems and worked on very advanced code bases.

GPT4 massively sped up my ability to create this.

It is a tool and it takes a lot of time to master it. Took me around 3-6 months of every day use to actually figure out how. You need to go back and try to learn it properly, it's easily 3-5x my work output.

jpeter 982 days ago

Including all of Github in your training dataset seems like a good idea