Hacker News new | ask | show | jobs
by p1esk 1137 days ago
When people say "ChatGPT" it's important to clarify if they mean GPT-4 (paid version), or the free GPT-3.5. GPT-4 is a completely different level, it's miles better in every way. It's like comparing a 10 year old to a 20 year old.
3 comments

It is better and has become a valuable resource to me (via the API).

Currently it suffers from being very expensive (I blew through my 30€ monthly budget in a few hours of intensive use for coding).

Also it still is very confidently wrong in sometimes subtle but fundamental ways.

My recent example is this:

I had success with developing rust proc macros with it. I don't know much about them but as a developer I can read the generated code just fine.

Yesterday I wanted to code a macro that adds an attribute to fields in an existing struct. It's actually not possible to do that but gpt-4 send me down a wrong track by "fixing" it's bugs when asked to without getting anywhere.

Asking it whether this is even possible is unreliable because in such niche cases it'll flip flop between answers.

Copilot has become a valuable tool and I've learned a lot by using both versions of GPT

Can you talk a bit about how you use the API for coding? I have API access, but I'm not entirely sure how to use it to great effect. How does it fit into your workflow?
Mostly for use cases which are isolated and with a technology I'm not very familiar (like rust proc macros) and which are hard to Google.

I'll give it a system prompt in the spirit of "you assist experts with developing software. Be brief and assume expertise".

I've found it to work well for smaller, contained problems or one off scripts where the alternative would have been to do it manually or not at all. Getting there 80% allows me to start them in the first place.

Another random example: Recently I needed a script to transfer some secrets from one k8s cluster to several others. It took about 3 minutes with GPT 4 and solved the problem within one iteration. There is probably a one liner in bash to do it but I don't know it of the top of my head;)

Copilot massively improved the quality of my logging and commenting

I got the impression that they gimped 3.5 severely over time. Since 4 is still restricted to 25 messages / 3 hours (for paying customers!), I sometimes fall back on 3.5. Of course it's impossible to prove, but it feels like it's failing hilariously at tasks it could do easily a few months ago.

I wonder if more people have this suspicion or if it's just my imagination?

I would recommend signing up to GPT-4 API access and, upon hopefully getting it, using a third-party frontend like https://bettergpt.chat/ rather than the official ChatGPT page.

You'll never hit a ratelimit as far as I can tell, and it's usage-based so it will probably come out cheaper than $20/mo for regular usage.

I definitely noticed a drop in quality when the gimped (but presumably dramatically cheaper to run) GPT-3.5-turbo model was introduced on the free version. As a paying subscriber I think you should still have access to the original GPT-3.5 (as "Legacy"), have you compared them?
I don't think there are two versions of GPT-3.5. it seems all just code-davinci-002 with fine tuning on top.

https://platform.openai.com/docs/model-index-for-researchers

It could be from training if more to be safer. This was noted by Microsoft early on with GPT-4. Specifically, when looking at the tikz unicorn qualitative benchmark, the unicorn got better with more epochs, which is obviously expected.

However, very interestingly, the unicorn image got far worse when they trained the model to be safer by trying to correct discrimination against various demographics.

This isn't very intuitive to me why that may occur, and seems to conflict with what has been shown in ROME, etc. So I'm surprised it hasn't been commented upon more. It's certainly one of the best examples of how we don't understand what's going on with these models, and it causes very unexpected outcomes.

The tl;dr that I recall is that the current 3.5 sacrifices some things for efficiency. The older 3.5 that's being phased out (but I think is still accessible in the UI) was the original one which I assume was too expensive or risky to keep running as-is.

I didn't copy a reference, just been reading AI topics on HN lately.

I think you're exaggerating a bit

it's good, but the difference is not that big