Hacker News new | ask | show | jobs
by gateorade 1179 days ago
This has been my experience. I’m really impressed by how well GPT-4 seems to be able to interpolate between problems heavily represented in the training data to create what feels like novelty, eg. Creating a combination of pong and conway’s game of life, but it doesn’t seem to be good at extrapolation.

The type of work I do is highly niche. I’ve recently been working on a specific problem for which there are probably only a hundred at most implementations running on production systems, all of them highly proprietary. I would be surprised if there were any implementations in GPTs training set. With that said, this problem is not actually that complicated. A rudimentary implementation can be done in ~100 lines of code.

I asked GPT-4 to write me an implementation. It knew a decent amount about the problem (probably from Wikipedia). If it was actually capable of something close to reasoning it should have been able to write an implementation, but when it actually started writing code it was reluctant to write more than a skeleton. When I pushed it to implement specific details it completely fell apart and started hallucinating. When I gave it specific information about what it was doing wrong it acknowledged that it made a mistake and simply gave me a new equally wrong hallucination.

The experience calmed my existential fears about my job being taken by AI.

12 comments

This exact scenario is what I described to a friend of mine who is an AI researcher.

He was convinced that if we trained the AI on enough data, GPT-x would become sentient.

My opinion was similar to yours. I felt like the hallucinating the AI does was insufficient in performing true extrapolating thought.

I said this because humans don’t truly have access to infinite knowledge, even when they do, they can’t process all of it. Adding endless information for the AI to feed on doesn’t seem like the solution to figuring out true intelligence. It’s just more of the same hallucinating.

Yet despite lacking knowledge, us humans still come up with consistently original thoughts and expressions of our intelligence daily. With limited information, our minds create new representations of understanding. This seems to be impossible for Chat GPT.

I could be completely wrong, but that discussion solidified for me that my role as a dev still has at least a couple more decades of shelf life left.

It’s nice to hear that others are reaching similar conclusions.

Current LLMs decode in a greedy manner, token by token. In some cases this is good enough - namely for continuous tasks, but in other cases the end result means the model has to backtrack and try another approach, or edit the response. This doesn't work well with the way we are using LLMs now, but could be fixed. Then you'd get a model that can do discontinuous tasks as well.

>> Write a response that includes the number of words in your response.

> This response contains exactly sixteen words, including the number of words in the sentence itself.

It contains 15 words.

The model would have to plan everything before outputting the first token if it were to solve the task correctly. Works if you follow up with "Explicitly count the words", let it reply, then "Rewrite the answer".

> but could be fixed

How? The problem is known for a while, for example this article [0] mentions it (as Chain of Thought reasoning). You could think that just having a scratchpad of tokens is enough - you can arguably plan, backtrack and rewrite there [1], right? But this doesn't really work, at least yet - maybe because it wasn't trained for that - and maybe ChatGPT massive logs (probably available only for OpenAI) can help. But the Microsoft report [2] suggests we need a different architerture and/or algorithms? They mention lack of planning and retrospective thinking as a huge problem for GPT-4. Maybe you know some articles on the ideas how to fix this? Backtracking, trying again seems to be linked to human thought - and very well can give us AGI.

[0] https://arxiv.org/abs/2201.11903

[1] https://www.reddit.com/r/ChatGPT/comments/120fi8e/chatgpt_4_...

[2] https://arxiv.org/abs/2303.12712

You may be shocked to hear this but dijkstra’s short path algorithm is the technical answer to this question. We just don’t use it because it’s expensive.
Language chains or tool use where it can also call on itself to solve subproblems. If you don't have to do just one round of LLM interaction you can do complex stuff.
Backtracking to edit the response is theoretically easily solved by training on a masked language modeling objective instead of an autoregressive one, but using it to actually generate text is a bit expensive because you can't just generate one token at a time and be done, you might have to reevaluate each output token every time another token is changed. So I expect autoregressive generation to remain the default until the recomputation effort can be significantly reduced or hardware advances make the cost bearable.
>> Backtracking to edit the response is theoretically easily solved by training on a masked language modeling objective instead of an autoregressive one, but using it to actually generate text is a bit expensive because you can't just generate one token at a time and be done, you might have to reevaluate each output token every time another token is changed.

I can't imagine how training on masked tokens can "easily" solve backtracking, even in theory. Do you have some literature I could read on this?

Discrete diffusion with rewriting can work well. It feels loosely similar to backtracking, if you assume n_steps large enough - need to be able to rewrite any non-provided position though I think (not all setups do this). Downside is the noise in discrete diffusion (in simplest case randomizing over all vocabulary space) is pretty harsh and makes things very difficult practically. Don't have an exact reference on the relationship, but feels similar to backtracking type mechanics in my experience. I found things tend to "lock in" quickly once a good path is found, which feels a lot like pathfinding to me.

Some early personal experiments with adding "prefix-style" context by a cross-attention (in the vein of PerceiverAR) seemed like it really helped things along, which would kind of point to search-like behavior as well.

Probably the closest theory I can think of is orderless NADE, which builds on the "all orders" training of https://arxiv.org/abs/1310.1757 , which in my opinion closely relates to BERT and all kinds of other masked language work. There's a lot of other NAR language work I'm skipping here that may be more relevant...

On discrete diffusion:

Continuous diffusion for categorical data shows some promise "walking the boundary" between discrete and continuous diffusion https://arxiv.org/abs/2211.15089 , personally like this direction a lot.

If you have a pre-made embedding space, SSD-LM is a straightforward method https://arxiv.org/abs/2210.17432

SUNDAE worked well for translation https://arxiv.org/abs/2112.06749 and many other tasks.

My own contribution, SUNMASK, worked reasonably well for symbolic music/small datasets (https://openreview.net/forum?id=GIZlheqznkT), but really struggled with anything text or moderately large vocabulary, maybe due to training/compute/arch issues. Personally think large vocabulary discrete diffusion (thinking of the huge vocabs in modern universal LM work) will continue to be a challenge.

Decoding strategies:

As a general aside, I still don't understand how many of the large generative tools aren't exposing more decoding strategies, or hooks to implement them. Beam search with stochastic/diverse group objectives, per-step temperature/top-k/top-p, hooks for things like COLD decoding https://arxiv.org/abs/2202.11705, minimum Bayes risk https://medium.com/mlearning-ai/mbr-decoding-get-better-resu..., check/correct systems during decode based on simple domain rules and previous outputs, etc.

These kinds of decoding tools have always been a huge boost to model performance for me, and having access to add in these hooks to "big API models" would be really nice... though I guess you would need to limit/lock compute use since a full backtracking search would pretty swiftly crash most systems. Maybe the new "plugins" access from OpenAI will allow some of this.

Backtracking is easily solved with a shortest path algorithm. I don’t see any need for masking if you are simply maximizing likelihood of the entire sequence.
I don't think humans can do this either. What's the problem with producing a result and then fixing it? It's exactly how we do it.
> This exact scenario is what I described to a friend of mine who is an AI researcher. He was convinced that if we trained the AI on enough data, GPT-x would become sentient. My opinion was similar to yours. I felt like the hallucinating the AI does was insufficient in performing true extrapolating thought.

It turns out it isn’t just AIs that hallucinate; AI researchers do as well.

"researcher".
> He was convinced that if we trained the AI on enough data, GPT-x would become sentient.

Is there enough data?

As I understand it, the latest large language models are trained on almost every piece of available text. GPT-4 is multimodal in part because there isn't an easy way to increase its dataset with more text. In the meantime, text is already quite information dense.

I'm not sure that future models will be able to train on an order of magnitude more information, even if the size of their training sets has a few more zeroes added to the end.

what about all the content not yet in text form (e.g. YouTube videos)?
The threshold for sentience is continually falling.

So he might be right but due to time and not due to improved performance.

I believe in the UK all vertibrates are considered sentient (by law not science). That includes goldfish.

And good luck even getting a goldfish to reverse a linked list. Even after 1000 implementations are provided.

I don't think that when people commonly discuss sentience they mean to include goldfish. I don't think the legal definition (which probably exists due to external legal implications) has any bearing on the intellectual debate of AI sentience.
Sentience is just the capacity to experience feelings and sensations. Goldfish can do it and AI can’t (so far).
If I were talking about sentience I would definitely be including goldfish. What about them is so different to us that we would have sentience while they would not?
> He was convinced that if we trained the AI on enough data, GPT-x would become sentient.

Not saying your friend is right or wrong, but imagine if civilization gives more information, in realtime, to an AI system through sensors: will be at least sentient as the civilization? Seems like a scifi story, a competitor to G-d.

Isaac Asimov wrote a story along those lines, “The Last Question”, which he described as “by far my favorite story of all those I have written.” Full text here:

https://xpressenglish.com/our-stories/the-last-question/

Some versions of divinity (both from real-world beliefs and sci-fi/fantasy) have it being essentially a gestalt of either all the souls that have ever died, or all those alive now—a kind of "oversoul" or collective consciousness.

While that's an interesting thought experiment, I don't think it can meaningfully apply to any kind of AI we have the capability to make today, even if we could hook it up directly to all our knowledge. Information alone can't make something sentient; it requires a sufficiently complex and sophisticated information processing system, one that can reason about its knowledge and itself.

I’m not at all an expert on the topic, but from what I gathered LLMs are fundamentally limited in the kind of problems they can approximate. They can approximate any integrable function quite well, but we can only come up with limits on a case-by-case basis for non-integrable ones, and I believe most interesting problems are of this latter kind.

Correct me if I’m wrong, but doesn’t it mean that they can’t recursively “think”, on a fundamental basis? And sure I know that you can pass “show your thinking” to GPT, but that’s not general recursion, just “hard-coded to N iterations” basically, isn’t it? And thus no matter how much hardware we throw at it, it won’t be able to surpass this fundamental limit (and without proof, I firmly believe that for a GAI we do need the ability to basically follow through a train of thought)

How is it "hard-coded to N iterations"? We don't instruct the model how many lines of working it should show.

Obviously there is a limit to how much it can fit in the context, but that seems to be rising fast (went from 4k to 32k in not that long)

It fundamentally can’t recurse into a thought process. Let’s say I give you a symbol table where each symbol means something and ask you to “evaluate” this list of symbols. You can do that just fine, but even in theory not even GPT-10384 will be able to do that without changing the whole underlying model itself.
I don't understand the task. What does evaluating the list of symbols mean?

Do you mean you define a programming language/bytecode and then feed it into the model?

He's an example where GPT-4 did this perfectly for a very sinple language. This was my first attempt, I did not have to do any trial an error.

https://pastebin.com/4YA5wpie

Could you try writing even in this simple language a longer program? Just simply increase the input to 20x or something around that. I’m interested in whether it will break and if it does, at what length.
If they aren't already, AIs will be posting content on social media apps. These apps measure the amount of attention you pay to each thing presented to you. If it's more than a picture or a video, but something interactive, then it could also learn how we interact with things in more complex ways. It also gets feedback from us through the comments section. Like biological mutations, AIs will learn which of its (at first) random novel creations we find utility in. It will then better learn what drives us and will learn to create and extrapolate at a much faster pace than us.
> If they aren't already, AIs will be posting content on social media apps.

No, people will be posting content on social media apps that they asked LLMs to write.

It may be done through a script, or API calls, but it's 100% at the instigation, direct or indirect, of a human.

LLMs have no ability to decide independently to post to social media, even if you do write code to give them the technical capability to make such posts.

With the new ChatGPT Plugins, it seems they may actually be able to make POST requests to social media APIs soon. It is likely that an LLM could have "I should post a tweet about this" in its training data.

Granted... currently it is likely humans that have written the code that the new Plugins are allowed to call -- but they have given ChatGPT the ability to execute rudimentary Python scrips and even ffmpeg so I think it is only a matter of time before one outputs a Tweet written by its own code.

> It is likely that an LLM could have "I should post a tweet about this" in its training data.

That only matters if a human has explicitly hooked it up so that when ChatGPT encounters that set of tokens, it executes the "post to Twitter" scripts.

ChatGPT doesn't comprehend the text it's producing, so without humans making specific links between particular bundles of text and the relevant plugin scripts, it will never "decide" to use them.

At a high level, all that would have to happen is a person gives GPT, or something like it, access to a social media page and tells it to post to it with the objective of getting the highest level of interaction and followers.
> Yet despite lacking knowledge, us humans still come up with consistently original thoughts and expressions of our intelligence daily.

I think there is some sampling bias in your observation ;-)

More data will only mean more inference. But at some unexpected moment, the newly created "senseBERT" breaks the barrier between intelligence and consciousness.
> He was convinced that if we trained the AI on enough data, GPT-x would become sentient.

It sounds like he doesn't even understand the basics of what GPT is, or what sentience is. GPT is an impressive manipulator/predictor of language, but we have evidence from all sorts of directions that there's more to sentience or consciousness than that.

I would like to propose a thought experiment concerning the realm of knowledge acquisition. Given that the scope of human imagination is inherently limited, it is inevitable that certain information will remain beyond our grasp; these are the so-called "known unknowns." In the event that an individual generates a piece of knowledge from this inaccessible domain, how might it manifest in our perception? It is likely that such knowledge would appear incomprehensible to us. Consequently, it is worth considering the possibility that the GPT model is not, in fact, experiencing hallucinations; rather, our human understanding is simply insufficient to fully grasp its output.
Yeah. Maybe when a baby says "gabadigoibygee", he is using an extremely efficient language that is too sophisticated for our adult brains to comprehend.

Yeah, maybe.

> In the event that an individual generates a piece of knowledge from this inaccessible domain, how might it manifest in our perception? It is likely that such knowledge would appear incomprehensible to us.

If what a person says cannot be comprehended by any other person, we usually have a special term for it.

But the hallucinated code doesn’t work.
This is ridiculously “meta”, but I’ve said the same thing, at some point GPT-x will be useless as it will be beyond our comprehension, that’s if it’s actually “smart”.

My honest opinion is the hallucinations are just gibberish, but are they useful gibberish? Maybe we’re saying the same thing ?

> GPT-x will be useless as it will be beyond our comprehension, that’s if it’s actually “smart”.

Things don’t have to be comprehensible before they’re useful. But they have to work to be useful.

Not hard to check whether code compiles or runs.
> The experience calmed my existential fears about my job being taken by AI.

The issue is that among all the 100k+ software engineers, many don't really do anything novel. How many startups are employing dozens of engineers to create online accessible CRUDs to replace a spreadsheet?

In the company I work for I'd say we have about 15 developers or about 3 teams doing interesting work, and everyone else builds integrations, CRUDs, moves a button there and back in "an experiment", ads a new upsell, etc. All these last parts could be done by a PM or good UX person alone, given good enough tools.

The other parts I'm not worried about either.

For the type of engineers you describe the hard part I think is communication with other devs, communication with product owners, understanding the problem, suggesting different ways of solving the problem, figuring out which department personnel (outside other devs) to talk to about a little detail that you don't have... it's not writing the code which is hard, atleast from my experience
Yes. I won't be worried until the day Joe CEO can write a prompt like "build me an app that lets me know where my employees are at all times," and GPT responds with a list of questions about how Joe imagines this being physically implemented, and then calls up the legal department to clear its methods.
I think this is closer than you expect
The question is... writing the code is a very small part of the job.

Figuring out what code to write is one of the big parts.

Fixing it when it breaks in many creative ways is the other big part.

How good is ChatGPT at fixing bugs? Security bugs or otherwise?

Sure but the other parts you don't need an engineering degree for, the other parts amount to design / product work, not engineering.
1. You don't need an engineering degree for software development in many, many cases. So I don't understand your argument.

2. Engineers design stuff :-) I'm not sure what you mean with "product work". Also, engineers debug and fix stuff :-)

Product work = figuring out what to build
Product work is a fractal and you don't want "product people" designing things past the 2nd or 3rd fractal step, in my experience.
I had a similar experience. I wanted it to write code to draw arcs on a world map, with different bends rather than going on a straight bearing. I did all the tricks, told it to explain its chain of thought, gave it a list of APIs to use (with d3-geo), simplified and simplified and spent a couple hours trying to reframe it.

It just spit out garbage. Because (afaict) there aren't really examples of that specific thing on the Internet. And it's just been weirdly bad at all the cartography-related programming problems I've thrown at it, in general.

And yeah, I'm much less worried about it replacing me now. It's just not.. lucid, yet.

GPT-4 is reasonably good at D3 and drawing arcs on a projection (e.g. orthographic) is not that unique, you’ll find examples of it on observable. However I wonder if you broke down the problem into a small enough task. It performs best if you provide a clear but brief problem description with a code snippet that already kind of does what you want (e.g. using straight lines) and then just ask it to modify your code to calculate arcs instead. The combination of clear description + code I found decreases the likelihood of it getting confused about what you’re asking and hallucinate. If you give it a very long-winded request with no code as basis for it then good luck.
I did try the code snippet technique, but unfortunately it got it wrong. For example, I gave it code that drew arcs but didn't follow the shortest great-circle distance, and it gave me several plausible-looking approaches that were completely wrong (e.g. telling ctx.arc to draw counterclockwise, which does the wrong thing because it needs to use projections instead.)

I eventually just asked it to compute coordinates to a point c perpendicular to the midpoint on the great arc between a and b, such that the angle between ab and ac is alpha. I tried for hours, asking it to work out equations and name the mathematical identities it used etc. but it was all gibberish.

So the closer you come to writing the code for it the better it does
I imagine that creative approaches to spacial problem solving would be one of the harder areas for it - not just because there are by definition fewer public examples of one-off or original solutions, but also because one has to visualize things in space before figuring out how to code it. These bots don't have a concept of space. I'm thinking of DALL-E (et. al) having problems with "an X above Y, behind Z".
GPT4 has its hands tied behind its back. It does not have active learning and it does not have a robust system of memory or a reward/punishment mechanism. We only now start seeing work on this side [1]

It might not know more than you about your niche. I don't. I would search and I would try to reason, but if I was forced to give a token by token output that is answering the question as truthfully as possible, I might have started saying bullshit as well.

I don't think that the fact that gpt doesn't know things or does some things wrong is sufficient to save dev work from automation.

[1]: https://github.com/noahshinn024/reflexion-human-eval

> The experience calmed my existential fears about my job being taken by AI.

Same for me. I didn't try GPT-4 yet, and not on code from work anyway but GPT-3 seems borderline useless at this point. The hallucinations are quite significant. Also I tried to produce advice for Agile development with references and as stated in other articles the links where either 404s or even completely unrelated articles.

Still I'm taking this seriously. Just considering the leaps that happened with AlphaGo/AlphaZero or autonomous driving, that was considered unthinkable in the respective domains before.

Even if AI only takes over “easy” programming jobs, it might still create a huge downward pressure on compensation.

After all, just look at manufacturing. Compared to 1970 we produce 5x the real output but employ only 50% the people. The same will likely happen to fields like programming as AI improves.

For the crap devs maybe, but high skill devs and arcitechts will be able to charge more than ever to oversee all of this «productivity» from the AIs.
I asked it to write a trivial c#/dotnet example of two actors where one sends a ping message and the other responds with pong. It couldn't get the setup stage right, called several methods that don't exist, and and had a cyclic dependency between actors that would probably take some work to resolve.

Event after several iterations of giving it error messages and writing explanations of what's not working, it didn't even get past the first issue. Sometimes it would agree that it needs to fix something, but would then print back code with exactly the same problem.

Yes, exactly this.

I wrote some questions in the specialist legal field of someone in my household, then started to get into more specialist questions, and then specifically asked about a paper that she wrote innovating a new technique in the field.

The general question answers were very impressive to the attny. The specialist questions started turning up errors and getting concepts backwards - bad answers.

When I got to summarizing the paper with the new technique, it could not have been more wrong. It got the entire concept backwards and wrong, barfing generic and wrong phrases, and completely ignored the long list of citations.

Worse yet, to the point of hilariously bad, when asked for the author, date, and employer of the paper, it was entirely hallucinating. Literally, the line under the title was the date, and after that was "Author: [name], [employer]". It just randomly put up dates and names (or combinations of real names) of mostly real authors and law firms in the region. Even when pointed out the errors, it would apologize, and then confidently spout a new error. Eventually it got the date correct, and that stuck, but even when prompted with "Look at where it says 'Author: [fname]" and tell me the full name and employer, it would hallucinate a last name and employer. Always with the complete confidence of a drunken bullshit artist.

Similar for my field of expertise.

So, yes, for anything real, we really need to keep it in the middle-of-the-road zone of maximum training. Otherwise, it will provide BS (of course if it is BS we want, it'll produce it on an industrial scale!).

Yeah, in that sense I think one of the next logical steps will be providing on-demand lightweight learning/finetuning of LLM versions/forks (maybe as LoRAs?) as an API and integrated UX based on user chat feedback, while abstracting away all the technical hyperparameter and deployment details involved in a DIY setup. With a lucrative price tag of course.
> but it doesn’t seem to be good at extrapolation.

This is true to varying degrees for every statistical model ever.

Yeah that’s basically my point. The hype on HN/Twitter/etc. forget this.
What would you be able to write with similar requests, if you'd only ever be allowed to use Notepad, and never run compiler/linter/tests, and not allowed to use Internet?
Given I don't have petabytes of information accessible for instant retrieval (including perfect copies of my language of choice's entire API) I don't think that's comparable. I wouldn't need the entire if I'd memorized a large portion of it.
GPTs don't have access to petabytes of information, that's the point. Only to some internal representation.
Unlike current LLMs, your typical competent programmer would not hallucinate.
Quants jobs are safe because if it’s public there’s no edge