| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by netsec_burn 728 days ago

Opus remained better than GPT for me, even after the release of GPT-4o. VERY happy to see an even further improvement beyond that, Claude is a terrific product and given the news that GPT-5 only began its training several weeks ago I don't see any situation where Anthropic is dethroned in the near term. There are only two parts of Anthropic's offering I'm not a fan of:

- Lack of conversation sharing: I had a conversation with Claude where I asked it to reverse engineer some assembly code and it did it perfectly on the first try. I was stunned, GPT had failed for days. I wanted to share the conversation with others but there's no way provided like GPT, and no way to even print the conversation because it cuts off on the browser (tested on Firefox).

- No Android app. They're working on this but for now, there's only an iOS app. No expected ETA shared, I've been on the waitlist.

I feel like both of these are relatively basic feature requests for a company of Anthropic's size, yet it has been months with no solution in sight. I love the models, please give me a better way of accessing them.

13 comments

sk11001 728 days ago

Both GPT-4 and 4o have been completely useless for coding in the past couple of weeks for me - constant errors, and not just your typical LLM inaccuracies but incapable of producing a few lines of self-consistent code e.g. defines variables foo on one line and refers to it as bar on the next, or it misspells it as foox.

labrador 728 days ago

Waht language? Because I'm guessing they work well for languages with a large amount of training data like Python (in my experience), less well for less used languages like Zig or Clojure (haven't tried them but that's my theory)

rads 728 days ago

From my experience, GPT-4 works well with both Clojure and Zig. A lot of it depends on the way you prompt though. For example, asking to start with a C or C++ example and converting to Zig often works better than starting straight with Zig. The same strategy works with Java and Clojure too.

ModernMech 728 days ago

I use it for Rust and it's.... meh. It gets things wrong enough that I don't reach for it except to help me reference certain docs. It tends to hallucinate APIs and semantics that just don't exist. Honestly couldn't imagine using it with a dynamic language.

ndr_ 727 days ago

Python here. And like they said, only noticable in the last few weeks.

heyitsguay 727 days ago

I've been seeing this too. Always hard to tell what's a real change vs the rolls of the dice lately but I've been having weird python inconsistencies too, in very short snippets doing pretty simple things.

esafak 728 days ago

For me it has been very repetitious despite my instruction to the contrary.

Zetaphor 727 days ago

I've been experiencing bizarre typos and misspellings that I've come to describe as the model being drunk. Things like it writing peremeter instead of parameter

hombre_fatal 727 days ago

Yeah, misspellings were something so rare that I thought an LLM was incapable of producing them.

Yet over the past few weeks GPT-4 and 4o make them all the time. It will randomly change my postgres schema from public to publish. And, well, just this one for yourself:

> *Using the 'kubectl cp Command*: Execute the 'czygk cp' command to copy the file from your local machine to the pod.

Today, I asked 4o how to get around conditionally executing React hooks (illegal in React) and it rewrote my code to simply do it again but it merely swapped the order of a ternary, performance possibly worse than gpt3.

Maybe they’re weakening it because they expanded their free tier, but it has become surprisingly bad.

kake25 723 days ago

The level of misspelling is insane at the moment. It does it almost 50%+ of the times. I just started using claude 3.5 and the difference is night and day.

ipsum2 728 days ago

It's the same model though. Maybe your perception has changed.

ndr_ 727 days ago

I have first noticed logprob fluctuations in GPT-4o. Perhaps the same phenomenon is also going on with Turbo. I din‘t recall specifics but it was naming inconsistencies with variable names, meaning: same variable name got a typo somewhere, but the typo was close enough - perhaps a space vs. an underscore or something like that.

Model could be the same, but maybe some in the infra is different.

great_psy 727 days ago

I can’t speak for what OpenAI is doing, but I’ve noticed those types of hallucinations occurring when I quantize a model beyond a certain point.

Maybe they are trying to cut down on memory usage ?

edub 727 days ago

Is it the same? On the Models page of the API docs it says that GPT-4 is using the June 13th which would be different than the March 23rd.

Alifatisk 727 days ago

> I had a conversation with Claude where I asked it to reverse engineer some assembly code and it did it perfectly on the first try. I was stunned

I share the same experience with you but with Claude 3 Sonnet. I can’t count how many times I’ve shared some code with Claude with barely any hope because other GPTs failed aswell, yet, Claude surprised me and performed the task with success.

I’ve actually reached to the point that I expressed my gratitude to Claude because of how well it performs on coding tasks and other tasks in general. I don’t know what Anthropic did, but something did they right.

Being able to handle large amounts of tokens, “understand” and perform tasks on it & spit out large amounts of data back with barely any cut-offs (unlike Gemini) has made me feel like Claude is at the moment the best option.

SubiculumCode 728 days ago

I do wonder if GPT quality fluctuates seasonally, or with electricity costs, in an engineering effort to balance costs with performance.

I agree on all your points, but would like to emphasize that I really do enjoy the voice input voice output thing that chatgpt's app has. Its not how I use it when working, but when commuting, a lot of times, I'll turn on the the chatgpt app and have a conversation with it exploring ideas related to work or side projects. Its better than NPR, and I can't listen to the '3d6 Down the Line' podcast everyday, just once a week.

I've been subscribed to PHind, which is a decent service allowing access to their models, chatgpt 4 turbo and o, and claudes. Its been incredibly useful, especially with their search integration. Unfortunately, while chatgpt can be used 500 times a day, Claude is only 10, although I guess it goes into an API like payment mode after that on top of subscription.

I sure wish I'd buckle down and calculate my usage to really get an idea of whether subscription is cheaper or more expensive for me compared to API.

lxgr 727 days ago

Short of switching between models (which at least OpenAI definitely does for free customers, but I believe they always indicate it), how would that work? Different quantizations?

SubiculumCode 727 days ago

caught me speculating. I suppose some mild quanting and/or prompt injection to keep responses smaller unless specifically asked: e.g. use ...

henry_viii 728 days ago

> Lack of conversation sharing... [there is] no way to even print the conversation because it cuts off on the browser (tested on Firefox).

Until they make conversations shareable, in the meantime you can print the whole page in Chrome by:

- going to Developer Tools (Ctrl + Shift + I)

- opening the Command Palette (Ctrl + Shift + P)

- searching for 'screenshot'

- selecting Capture full size screenshot

coreylane 728 days ago

I recently released Slackrock [https://github.com/coreylane/slackrock] that you may find helpful, it's a Slack chat app that can access several FMs (including Claude 3.5) via AWS Bedrock. Responses can be easily shared with others by inviting them to your channels, and Slack has an Android app. It doesn't support attachments (yet) but I'm working on it!

natsucks 728 days ago

cool!

wonderfuly 727 days ago

> Lack of conversation sharing

You can use my product https://ChatHub.gg which supports dozens of chatbots including Claude and can share conversations from any of them.

trungdq88 727 days ago

If you have an API key, using Opus with a 3rd party UI like typingmind.com solves all of the problems you mentioned (disclaimer: I'm the app developer)

lannisterstark 727 days ago

I use LibreChat for this as self hosted UI. Works awesome.

mac-attack 728 days ago

I'm sticking w/ Claude for the foreseeable future as they seem less slimy than OpenAI/Microsoft/Google so far and care about safety.

I'm in the same boat waiting for an Android app btw. One other feature that I'm hoping they catch up to others on is a permanent context window so that I can get Claude to stop speaking so formally all the time

joshstrange 728 days ago

To each their own, but I still prefer ChatGPT. The UI for Claude is terrible in my opinion.

I had subscriptions for both and I would fire off questions to both of them and see which one I liked more and I consistently liked the ChatGPT ones more. I canceled my subscription last week for Claude. I am super happy that Anthropic continues to push the envelope on this and I hope to re-subscribe to them in the future.

spidersouris 728 days ago

If it's really only the UI that's bothering you, why not use a web UI such as Open WebUI?

joshstrange 728 days ago

The UI wasn’t the only issue, but I will look into that.

Powdering7082 728 days ago

> GPT-5 only began its training several weeks ago

Source?

netsec_burn 728 days ago

https://openai.com/index/openai-board-forms-safety-and-secur... (May 28th)

> OpenAI has recently begun training its next frontier model and we anticipate the resulting systems to bring us to the next level of capabilities on our path to AGI.

gagagaga7 728 days ago

No doubt openai have been training big models for the last year. If “gpt5” is only just starting it means recent training runs have had disappointing results and have been passed off as “Gpt4o” or whatever.

The value of all the AI companies is predicated on high chance of AGI, and gpt5 failing to be revolutionary may pop the whole bubble (+10 trillion of market cap)

Workaccount2 728 days ago

Sam said on Lex's podcast that people should temper their expectations for GPT-5, not in that it will necessarily suck, but that they want to ramp up ability slowly over time rather than discrete large steps.

letitgo12345 727 days ago

Sounds like an excuse tbh. Esp when other companies are pushing ahead beyond OAI and open source is close to rivaling them

cpeterso 727 days ago

Yeah. Sam wants to productize $$$ what they have now rather than sink time and money training future models with uncertain outcomes. I suspect that difference in focus is what Ilya Sutskever means by wanting to “advance capabilities as fast as possible” in the in Safe Superintelligence Inc. announcement.

cma 715 days ago

> If “gpt5” is only just starting it means recent training runs have had disappointing results and have been passed off as “Gpt4o” or whatever.

Sora probably took a lot of cluster time don't you think?

cadence- 728 days ago

Based on other things they said in the last couple of months, it looks like GPT-4.5 is coming this summer, and then GPT-5 in the Fall.

stuckinhell 728 days ago

I've had way better success with GPT-4o than claude. I wonder why

netsec_burn 728 days ago

Have you tried 3 Opus or 3.5 Sonnet? Are you using it for programming, or something else?

stuckinhell 724 days ago

everything really. just opus so far

simonw 728 days ago

Personal prompting style, I imagine,

Workaccount2 728 days ago

People really, really, underestimate how important prompting is.

I would be confident in stating that half the people who complain about a model are actually just suffering from poor prompting.

prng2021 727 days ago

And what makes you so confident that all those people are using different prompt styles when comparing models? You think most people don’t even understand the bare basics of how to compare two products?

simonw 727 days ago

That's the point: maybe someone has a personal prompting style that works great with Claude but gives worse results with GPT-4.

They might complain that GPT-4 is rubbish in comparison to Claude, but someone with a different personal prompting style might experience the opposite.

wiether 727 days ago

Having a prompting style that works with a model but not quite with another is much different than "suffering from poor prompting" the previous person was accusing others.

And given that those are tools, it's more like "the model can work with the user's prompts" rather than "the user's prompts are adapted to the model".

Unless we're here for an ego trip.

prng2021 727 days ago

Ah, I see. I’d be interested to see a study on that. I find it hard to believe it would make such a stark difference but it’s possible.

SSLy 727 days ago

are non-snake oil prompting techniques described anywhere?

simonw 727 days ago

Those are hard to come by, but the Anthropic prompting documentation is a pretty great source: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

viraptor 728 days ago

On the plus side, at least ChatBoost supports both openai and claude API. But for this specific model it seems to be broken... I hope that gets noticed and fixed soon.

gotrythis 728 days ago

What I understand is that it's GPT 6 that just went into training, and that GPT 5 is complete and being delayed until after the U.S. election.

PaulWaldman 728 days ago

And after GPT-5's release, what would be the plan for subsequent elections? This seems to be a temporary play in delaying AI regulation if public sentiment further becomes that AI can have a strong influence in the elections.

futureshock 728 days ago

It’s absolutely temporary, but 4 years feels like an eternity in this field and the m sure the major players would love to have that much time to entrench themselves before they have to battle “AI ban” legislation.

imjonse 728 days ago

GPT-5 will make elections obsolete :)

Sysreq2 728 days ago

Roko would be proud of you. I welcome our new electric masters.

antifa 727 days ago

Managed democracy offers absolute freedom; freedom from the burden of choice

viraptor 728 days ago

It there any online confirmation of this, that's more than speculation?

icpmacdo 728 days ago

No there is not

r2_pilot 728 days ago

(assuming you are correct) It says something about how a company feels about the safety of their products when they feel like they should time the releases based on political events.

futureshock 728 days ago

This is speculation because I don’t think any of the key players ever explicitly stated this is their strategy, but this year it feels like there’s some significant foot dragging on things like Sora and GPT-5. The big AI players really don’t want AI to become an election year punching bag and don’t want any major campaign promises around AI to placate a spooked electorate. And they really don’t want it to be revealed that generative AI powered bot armies outnumber real human political discourse 10-1. And they absolutely do not want an AI generated hoax video to have a measurable effect on the polls.

It’s a stopgap. If we get through this election without a major public freak out, it gives the industry 4 more years to take LLMs out to the point of diminishing returns and figure out safety before we get knee jerk regulation.

modeless 728 days ago

This is pure speculation, right?

gotrythis 728 days ago

Here's something that talks about it. I can't speak for the legitimacy, but I'm not pulling it out of my ass. They may be pulling it out of theirs. :-)

https://lifearchitect.ai/gpt-6/

gotrythis 728 days ago

I've listened to so many interviews that I couldn't tell you who said what at this point, but that is what I understood from somewhere. So, sure, take it as speculation.

sva_ 728 days ago

Source: trust me bro

ilaksh 728 days ago

I also believe that gpt-4o was originally called gpt-5. If you look at the image generation on their website from gpt-4o which has not been released, I believe that along with the voice caused Ilya to declare mission accomplished (AGI) and that is why there was a coup. The coup failed because no one wanted to wrap up the company or change the way it operated because they would lose a lot of money.

The reason the name was changed was because there was a big public scare about gpt-5 taking over and so Altman had to promise not to release gpt-5 soon. So they changed the name to gpt-4o (omni). Which is A) obviously dramatically a different architecture, B) a huge step up in capabilities (most still unreleased) C) very general purpose. Because of A) and B), this should obviously be a new major version (5).

Yes, this is speculation, but it's very obvious speculation to me. It's weird for me that most people not only don't share this view but seem to absolutely hate when I say it.

christianqchung 727 days ago

I don't hate this speculation, I just don't buy it at all. 4o's about the same in terms of reasoning as 4. People don't find the text abilities that much more usable over 4 (at least on the LMS leaderboard). It's faster and has audio2audio capabilities alongside new native image stuff I think, but how exactly is that AGI if 4 isn't? These models understanding and reasoning ability is still far too weak to do any serious economic shifts yet.

ilaksh 727 days ago

Scroll to Explorations of Capabilities: https://openai.com/index/hello-gpt-4o/

That combined with the voice was probably considered AGI by Ilya.

christianqchung 727 days ago

Yes, I've seen this. Read my comment.

Rastonbury 727 days ago

It's speculation with no basis at all, OAI has a track record of releasing half step models and 4o is no different just like 3 to 3.5 and the numerous subsequent 3.5 releases.

If you've used 4 and 4o they are too similar for 4o to have been trained from scratch