Hacker News new | ask | show | jobs
by url00 214 days ago
I don't want a more conversational GPT. I want the _exact_ opposite. I want a tool with the upper limit of "conversation" being something like LCARS from Star Trek. This is quite disappointing as a current ChatGPT subscriber.
18 comments

That's what the personality selector is for: you can just pick 'Efficient' (formerly Robot) and it does a good job of answering tersely?

https://share.cleanshot.com/9kBDGs7Q

FWIW I didn't like the Robot / Efficient mode because it would give very short answers without much explanation or background. "Nerdy" seems to be the best, except with GPT-5 instant it's extremely cringy like "I'm putting my nerd hat on - since you're a software engineer I'll make sure to give you the geeky details about making rice."

"Low" thinking is typically the sweet spot for me - way smarter than instant with barely a delay.

I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.
I like the term prompt performance; I am definitely going to use it:

> prompt performance (n.)

> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.

:)

Might be a result of using LLMs to evaluate the output of other LLMs.

LLMs probably get higher scores if they explicitly state that they are following instructions...

It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.
That's the equivalent of a performative male, so better call it performative model behaviour.
Pay people $1 and hour and ask them to choose A or B, which is more short and professional:

A) Keeping it short and professional. Yes, there are only seven deadly sins

B) Yes, there are only seven deadly sins

Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.

I can’t tell if you’re being satirical or not…
This is even worse on voice mode. It's unusable for me now.
I use Efficient or robot or whatever. It gives me a bit of sass from time to time when I subconsciously nudge it into taking a “stand” on something, but otherwise it’s very usable compared to the obsequious base behavior.
If only that worked for conversation mode as well. At least for me, and especially when it answers me in Norwegian, it will start off with all sorts of platitudes and whole sentences repeating exactly what I just asked. "Oh, so you want to do x, huh? Here is answer for x". It's very annoying. I just want a robot to answer my question, thanks.
At least it gives you an answer. It usually just restates the problem for me and then ends with “so let’s work through it together!” Like, wtf.
repeating what is being asked is fine i think, sometimes is thinks you want something different to what you actually want. what is annoying is "that's and incredibly insightul question that delves into a fundamental..." type responses at the start.
At least for the Thinking model it's often still a bit long-winded.
Unfortunately, I also don't want other people to interact with a sycophantic robot friend, yet my picker only applies to my conversation
Hey, you leave my sycophantic robot friend alone.
Sorry that you can't control other peoples lives & wants
This is like arguing that we shouldn't try to regulate drugs because some people might "want" the heroin that ruins their lives.

The existing "personalities" of LLMs are dangerous, full stop. They are trained to generate text with an air of authority and to tend to agree with anything you tell them. It is irresponsible to allow this to continue while not at least deliberately improving education around their use. This is why we're seeing people "falling in love" with LLMs, or seeking mental health assistance from LLMs that they are unqualified to render, or plotting attacks on other people that LLMs are not sufficiently prepared to detect and thwart, and so on. I think it's a terrible position to take to argue that we should allow this behavior (and training) to continue unrestrained because some people might "want" it.

What's your proposed solution here? Are you calling for legislation that controls the personality of LLMs made available to the public?
There aren't many major labs, and they each claim to want AI to benefit humanity. They cannot entirely control how others use their APIs, but I would like their mainline chatbots to not be overly sycophantic and generally to not try and foster human-AI friendships. I can't imagine any realistic legislation, but it would be nice if the few labs just did this on their own accord (or were at least shamed more for not doing so)
At the very least, I think there is a need for oversight of how companies building LLMs market and train their models. It's not enough to cross our fingers that they'll add "safeguards" to try to detect certain phrases/topics and hope that that's enough to prevent misuse/danger — there's not sufficient financial incentive for them to do that of their own accord beyond the absolute bare minimum to give the appearance of caring, and that's simply not good enough.
Pretty sure most of the current problems we see re drug use are a direct result of the nanny state trying to tell people how to live their lives. Forcing your views on people doesn’t work and has lots of negative consequences.
Okay, I'm intrigued. How in the fuck could the "nanny state" cause people to abuse heroin? Is there a reason other than "just cause it's my ideology"?
Comparing LLM responses to heroine is insane.
I'm not saying they're equivalent; I'm saying that they're both dangerous, and I think taking the position that we shouldn't take any steps to prevent the danger because some people may end up thinking they "want" it is unreasonable.
heroin is the drug, heroine is the damsel :)
You’re absolutely right!

The number of heroine addicts is significantly lower than the number of ChatGPT users.

I am with you. Insane comparisons are the first signs of an activist at work.
Disincentivizing something undesirable will not necessarily lead to better results, because it wrongly assumes that you can foresee all consequences of an action or inaction.

Someone who now falls in love with an LLM might instead fall for some seductress who hurts him more. Someone who now receives bad mental health assistance might receive none whatsoever.

Your argument suggests that we shouldn’t ever make laws or policy of any kind, which is clearly wrong.
I disagree with your premise entirely and, frankly, I think it's ridiculous. I don't think you need to foresee all possible consequences to take action against what is likely, especially when you have evidence of active harm ready at hand. I also think you're failing to take into account the nature of LLMs as agents of harm: so far it has been very difficult for people to legally hold LLMs accountable for anything, even when those LLMs have encouraged suicidal ideation or physical harm of others, among other obviously bad things.

I believe there is a moral burden on the companies training these models to not deliberately train them to be sycophantic and to speak in an authoritative voice, and I think it would be reasonable to attempt to establish some regulations in that regard in an effort to protect those most prone to predation of this style. And I think we need to clarify the manner in which people can hold LLM-operating companies responsible for things their LLMs say — and, preferably, we should err on the side of more accountability rather than less.

---

Also, I think in the case of "Someone who now receives bad mental health assistance might receive none whatsoever", any psychiatrist (any doctor, really) will point out that this is an incredibly flawed argument. It is often the case that bad mental health assistance is, in fact, worse than none. It's that whole "first, do no harm" thing, you know?

Who are you to determine what other people want? Who made you god?
...nobody? I didn't determine any such thing. What I was saying was that LLMs are dangerous and we should treat them as such, even if that means not giving them some functionality that some people "want". This has nothing to do with playing god and everything to do with building a positive society where we look out for people who may be unable or unwilling to do so themselves.

And, to be clear, I'm not saying we necessarily need to outlaw or ban these technologies, in the same way I don't advocate for criminalization of drugs. But I think companies managing these technologies have an onus to take steps to properly educate people about how LLMs work, and I think they also have a responsibility not to deliberately train their models to be sycophantic in nature. Regulations should go on the manufacturers and distributors of the dangers, not on the people consuming them.

here’s something I noticed: If you yell at them (all caps, cursing them out, etc.), they perform worse, similar to a human. So if you believe that some degree of “personable answering” might contribute to better correctness, since some degree of disagreeable interaction seems to produce less correctness, then you might have to accept some personality.
Interesting codex just did the work once I sweared. Wasted 3-4 prompts being nice. And angry style made him do it.
Actually DeepSeek performs better for me in terms of prompt adherence.
ChatGPT 5.2: allow others to control everything about your conversations. Crowd favorite!
so good.
You’re getting downvoted but I agree with the sentiment. The fact that people want a conversational robot friend is, I think, extremely harmful and scary for humanity.

Giving people what makes them feel good in the short term is not actually necessarily a good thing. See also: cigarettes, alcohol, gambling, etc.

Exactly. Stop fooling people into thinking there’s a human typing on the other side of the screen. LLMs should be incredibly useful productivity tools, not emotional support.
How would you propose we address the therapist shortage then?
Who ever claimed there was a therapist shortage?
The process of providing personal therapy doesn't scale well.

And I don't know if you've noticed, but the world is pretty fucked up right now.

... because it doesn't have enough therapists?
People are so naive if they think most people can solve their problem with a one hour session a week.
i think most western governments and societies at large
It's a demand side problem. Improve society so that people feel less of a need for theapists.
Oh, so you think we should improve society somewhat, eh? But you yourself live in society. Gotcha!
I think therapists in training, or people providing crisis intervention support, can train/practice using LLMs acting as patients going through various kinds of issues. But people who need help should probably talk to real people.
Remember that a therapist is really a friend you are paying for.

Then make more friends.

>Remember that a therapist is really a friend you are paying for.

That's an awful, and awfully wrong definition that's also harmful.

It's also disrespectful and demeaning to both the professionals and people seeking help. You don't need to get a degree in friendship to be someone's friend. And having friends doesn't replace a therapist.

Please avoid saying things like that.

outlaw therapy
I don't know why you're being downvoted. Denmark's health system is pretty good except adult mental health. SOTA LLMs are definitely approaching a stage where they could help.
something something bootstraps
Food should only be for sustenance, not emotional support. We should only sell brown rice and beans, no more Oreos.
Oreos won't affirm your belief that suicide is the correct answer to your life problems, though.
That is mostly a dogmatic question, rooted in (western) culture, though. And even we have started to - begrudgingly - accept that there are cases where suicide is the correct answer to your life problems (usually as of now restricted to severe, terminal illness).
The point the OP is making is that LLMs are not reliably able to provide safe and effective emotional support as has been outlined by recent cases. We're in uncharted territory and before LLMs become emotional companions for people, we should better understand what the risks and tradeoffs are.
I wonder if statistically (hand waving here, I’m so not an expert in this field) the SOTA models do as much or as little harm as their human counterparts in terms of providing safe and effective emotional support. Totally agree we should better understand the risks and trade offs but I wouldn’t be super surprised if they are statistically no worse than us meat bags this kind of stuff.
One difference is that if it were found that a psychiatrist or other professional had encouraged a patient's delusions or suicidal tendencies, then that person would likely lose his/her license and potentially face criminal penalties.

We know that humans should be able to consider the consequences of their actions and thus we hold them accountable (generally).

I'd be surprised if comparisons in the self-driving space have not been made: if waymo is better than the average driver, but still gets into an accident, who should be held accountable?

Though we also know that with big corporations, even clear negligence that leads to mass casualties does not often result in criminal penalties (e.g., Boeing).

> that person would likely lose his/her license and potentially face criminal penalties.

What if it were an unlicensed human encouraging someone else's delusions? I would think that's the real basis of comparison, because these LLMs are clearly not licensed therapists, and we can see from the real world how entire flat earth communities have formed from reinforcing each others' delusions.

Automation makes things easier and more efficient, and that includes making it easier and more efficient for people to dig their own rabbit holes. I don't see why LLM providers are to blame for someone's lack of epistemological hygiene.

Also, there are a lot of people who are lonely and for whatever reasons cannot get their social or emotional needs met in this modern age. Paying for an expensive psychiatrist isn't going to give them the friendship sensations they're craving. If AI is better at meeting human needs than actual humans are, why let perfect be the enemy of good?

> if waymo is better than the average driver, but still gets into an accident, who should be held accountable?

Waymo of course -- but Waymo also shouldn't be financially punished any harder than humans would be for equivalent honest mistakes. If Waymo truly is much safer than the average driver (which it certainly appears to be), then the amortized costs of its at-fault payouts should be way lower than the auto insurance costs of hiring out an equivalent number of human Uber drivers.

They also are not reliably able to provide safe and effective productivity support.
Maybe there is a human typing on the other side, at least for some parts or all of certain responses. It's not been proven otherwise..
I think they get way more "engagement" from people who use it as their friend, and the end goal of subverting social media and creating the most powerful (read: profitable) influence engine on earth makes a lot of sense if you are a soulless ghoul.
It would be pretty dystopian when we get to the point where ChatGPT pushed (unannounced) advertisements to those people (the ones forming a parasocial relationship with it). Imagine someone complaining they're depressed and ChatGPT proposing doing XYZ activity which is actually a disguised ad.

Other than such scenarios, that "engagement" would be just useless and actually costing them more money than it makes

Do you have reason to believe they are not doing this already?
No, otherwise Sam Altman wouldn’t have had a outburst about revenue. They know that they have this amazing system, but they haven’t quite figured out how to monetize it yet.
Yes, I've heard no reports of poorly fitting branded recommendations from AI models. The PR risk would be huge for labs, the propensity to leak would be high given the selection effects that pull people to these roles.
I've not heard of it, either.

But I suspect that we're no more than one buyout away from that kind of thing.

The labs do appear to avoid paid advertising today. But actions today should not be taken as an indicator to mean that the next owner(s) won't behave completely soullessly manner in their effort to maximize profit at every possible expense.

On a long-enough timeline, it seems inevitable to me that advertising with LLM bots will become a real issue.

(I mean: I remember having an Internet experience that was basically devoid of advertising. It changed, and it will never change back.)

Not really, but with the amounts of money they're bleeding it's bound to get worse if they are already doing it.
I use the "Nerdy" tone along with the Custom Instructions below to good effect:

"Please do not try to be personal, cute, kitschy, or flattering. Don't use catchphrases. Stick to facts, logic, reasoning. Don't assume understanding of shorthand or acronyms. Assume I am an expert in topics unless I state otherwise."

This. When I go to an LLM, I'm not looking for a friend, I'm looking for a tool.

Keeping faux relationships out of the interaction never let's me slip into the mistaken attitude that I'm dealing with a colleague rather than a machine.

I don't know about you, but half my friends are tools.
You can just tell the AI to not be warm and it will remember. My ChatGPT used the phrase "turn it up to eleven" and I told it never to speak in that manner ever again and its been very robotic ever since.
I added the custom instruction "Please go straight to the point, be less chatty". Now it begins every answer with: "Straight to the point, no fluff:" or something similar. It seems to be perfectly unable to simply write out the answer without some form of small talk first.
Aren't these still essentially completion models under the hood?

If so, my understanding for these preambles is that they need a seed to complete their answer.

But the seed is the user input.
Maybe until the model outputs some affirming preamble, it’s still somewhat probable that it might disagree with the user’s request? So the agreement fluff is kind of like it making the decision to heed the request. Especially if we the consider tokens as the medium by which the model “thinks”. Not to anthropomorphize the damn things too much.

Also I wonder if it could be a side effect of all the supposed alignment efforts that go into training. If you train in a bunch of negative reinforcement samples where the model says something like “sorry I can’t do that” maybe it pushes the model to say things like “sure I’ll do that” in positive cases too?

Disclaimer that I am just yapping

I had a similar instruction and in voice mode I had it trying to make a story for a game that my daughter and I were playing where it would occasionally say “3,2,1 go!” or perhaps throw us off and say “3,2,1, snow!” or other rhymes.

Long story short it took me a while to figure out why I had to keep telling it to keep going and the story was so straightforward.

This is very funny.
Since switching to robot mode I haven’t seen it say “no fluff”. Good god I hate it when it says no fluff.
I system-prompted all my LLMs "Don't use cliches or stereotypical language." and they like me a lot less now.
They really like to blow sunshine up your ass don’t they? I have to do the same type of stuff. It’s like have to assure that I’m a big boy and I can handle mature content like programming in C
Same. If i tell it to choose A or B, I want it to output either “A” or “B”.

I don’t want an essay of 10 pages about how this is exactly the right question to ask

10 pages about the question means that the subsequent answer is more likely to be correct. That's why they repeat themselves.
But that goes in the chain of thought, not the response
citation needed
First of all, consider asking "why's that?" if you don't know what is a fairly basic fact, no need to go all reddit-pretentious "citation needed" as if we are deeply and knowledgeably discussing some niche detail and came across a sudden surprising fact.

Anyways, a nice way to understand it is that the LLM needs to "compute" the answer to the question A or B. Some questions need more compute to answer (think complexity theory). The only way an LLM can do "more compute" is by outputting more tokens. This is because each token takes a fixed amount of compute to generate - the network is static. So, if you encourage it to output more and more tokens, you're giving it the opportunity to solve harder problems. Apart from humans encouraging this via RLHF, it was also found (in deepseekmath paper) that RL+GRPO on math problems automatically encourages this (increases sequence length).

From a marketing perspective, this is anthropomorphized as reasoning.

From a UX perspective, they can hide this behind thinking... ellipses. I think GPT-5 on chatgpt does this.

A citation would be a link to an authoritative source. Just because some unknown person claims it's obvious that's not sufficient for some of us.
Expecting every little fact to have an "authoritative source" is just annoying faux intellectualism. You can ask someone why they believe something and listen to their reasoning, decide for yourself if you find it convincing, without invoking such a pretentious phrase. There are conclusions you can think to and reach without an "official citation".
LLMs have essentially no capability for internal thought. They can't produce the right answer without doing that.

Of course, you can use thinking mode and then it'll just hide that part from you.

No, even in thinking mode it will sycophant and write huge essays as output.

It can work without, I just have to prompt it five times increasingly aggressively and it’ll output the correct answer without the fluff just fine.

They already do hide alot from you when thinking, this person wants them to hide more instead of doing their 'thinking' 'out loud' in the response.
Zachary Stein makes the case that conferring social statuses to Artificial Intelligences is a ex-risk. https://cic.uts.edu.au/events/collective-intelligence-edu-20...
Your comment reminded me of this article becasue of the Star Trek comparison. Chatting is inefficient, isn't it?

[1] https://jdsemrau.substack.com/p/how-should-agentic-user-expe...

Exactly, and it does't help with agentic use cases that tend to solve problem in on-shot, for example, there is 0 requirement from a model to be conversational when it is trying to triage a support question to preset categories.
Are you aware that you can achieve that by going into Personalization in Settings and choosing one of the presets or just describing how you want the model to answer in natural language?
Yea, I don't want something trying to emulate emotions. I don't want it to even speak a single word, I just want code, unless I explicitly ask it to speak on something, and even in that scenario I want raw bullet points, with concise useful information and no fluff. I don't want to have a conversation with it.

However, being more humanlike, even if it results in an inferior tool, is the top priority because appearances matter more than actual function.

To be fair, of all the LLM coding agents, I find Codex+GPT5 to be closest to this.

It doesn't really offer any commentary or personality. It's concise and doesn't engage in praise or "You're absolutely right". It's a little pedantic though.

I keep meaning to re-point Codex at DeepSeek V3.2 to see if it's a product of the prompting only, or a product of the model as well.

It is absolutely a product of the model, GPT-5 behaves like this over API even without any extra prompts.
I prefer its personality (or lack of it) over Sonnet. And tends to produce less... sloppy code. But it's far slower, and Codex + it suffers from context degradation very badly. If you run a session too long, even with compaction, it starts to really lose the plot.
Just put it in your system prompt?
Enable "Robot" personality. I hate all the other modes.
Gemini is very direct.
Engagement Metrics 2.0 are here. Getting your answer in one shot is not cool anymore. You need to waste as much time as possible on OpenAI's platform. Enshittification is now more important than AGI.
This is the AI equivalent of every recipe blog filled with 1000 words of backstory before the actual recipe just to please the SEO Gods

The new boss, same as the old boss

Things really felt great 2023-2024
Exactly. The GPT 5 answer is _way_ better than the GPT 5.1 answer in the example. Less AI slop, more information density please.
And utterly unsurprising given their announcement last month that they were looking at exploring erotica as a possible revenue stream.

[1] https://www.bbc.com/news/articles/cpd2qv58yl5o

Everyone else provides these services anyway, and many places offer using ChatGPT or Claude models despite current limits (because they work with "jailbraking" prompts), so they likely decided to stop pretending and just let that stuff in.

Whats the problem tbh.