Hacker News new | ask | show | jobs
by RayVR 1171 days ago
This aligns well with my personal experience using gpt-4.

The model provides surprisingly good responses on topics which I know are readily available online while being potentially troublesome to find the exact information I want. I have even found it useful when I know there is a tool for what I want but can’t recall the jargon used to find it via Google. Simply describing the rough idea is enough to get the model to spit out the jargon I need.

However, the moment I ask a real question that goes beyond summarizing something which is covered thousands of times online, I am immediately let down.

Is this just a result of the foundation of the model being the world best autocompletion engine? My assessment is “yes” and I don’t believe that any of the modifications coming, like plugins, will fundamentally change this.

5 comments

I have been thinking for a few weeks now that we need another term for large language models trained on colossal datasets: AGK, artificially generally/globally knowledgeable. It can mimic a likeness of problem solving because the corpus it was trained on is full of problem/solution pairs in the abstract. But task it with any novel problem solving challenge outside of its training that is of sufficient complexity and it will balk, thereby precluding it from being AGI, because humans are by nature problem solvers.

Furthermore, I just don’t feel like the transformer architecture is suited for problem solving. Like I may just be a charlatan but self attention over the space of words does not seem like it’s going to be enough, and praying it falls out in emergent behavior if we can just add more parameters is… unscientific-ish? Now, if you could figure out a way to do self-attention over the space of concepts? Maybe you’ve got something.

I feel like AlphaGo ideas and some variation on MCTS is more likely to produce a solid problem solving architecture.

Reading the paper it seems they are problems a lot of people would fail at it too, at least some of the time. LLMs are not superhuman in logical reasoning seems to be the conclusion more than anything.
What you’re saying gets to the core of why I would call it AGK and not AGI. Training a transformer on known answers to problems and then observing that it can successfully answer questions related to those problems is cheating.

I think the way that Ilya suggests that the “test for consciousness is to train a model with an absolute absence of any training example remotely referring to the notion of a self or of feeling, and then ask it questions about feeling. If the model can do it, congrats, you’ve discovered consciousness.” Similarly, if you train an architecture on exclusively the building blocks of a particular class of problem, and also avoid training it on any sort of problem where it could just reason by analogy and get a correct answer (isolating first principles thing as the only option), then if it can solve the problem you have a genuine problem solving architecture.

Meh Intelligence is Intelligence.

It's not cheating for people so asserting that it's cheating for machines just seems like goal post shifting more than anything.

Like this idea to pass the machines through frankly ridiculous hoops that humans wouldn't even pass is just..ehh. you seen how children with no language development in childhood turn out ?

It just misses the point entirely.

It's like the user down the thread said. Some isolate groups will build asi while the rest of the world is bickering about philosophical zombies and consciousness.

> It's not cheating for people so asserting that it's cheating for machines just seems like goal post shifting more than anything.

I genuinely appreciate this argument, and was considering it myself. In which case, I’d almost argue that we “have” already achieved AGI, and maybe it’s just not that thrilling.

If you define agi to be artificial and generally intelligent at the human level then yes we have.

It seems though that definitions of agi have since shifted to "better than human experts in all tasks" in which case no...not yet.

Often it can actually solve more complex problems but needs to have its "hand held". Essentially the model needs to be guided to/through problem solving techniques. We have to remember that LLM are literally inference engines. They default to providing us with probable results, probable responses. They can pe pulled away from these "knee jerk" responses.
> Often it can actually solve more complex problems but needs to have its “hand held”. Essentially the model needs to be guided to/through problem solving techniques.

While I haven’t done experiments with it hooked up to enough resources to really solve problems autonomously, providing it access to lookup information (e.g., searching wikipedia) and do simply computation (e.g., send python expressions to be evaluated) it figures out a lot more than just the chat interface alone without resources, without hand holding. I think autonomously solving problems where the necessary information is in the universe covered by training data and accessible resources is not unrealistic.

Right but if it needs its hand held, that ends up being a transcription task rather than a logical reasoning task. Like if you _tell it_ the solution to a coding job in detail, it can build you the complex entity you’re looking for. But if you just say for instance “write me a Python script that generates random chord changes (ex A#dim to Gmaj9b5)”, first of all it will just dump code without asking for clarification on requirements, and second of all even if you do give it further clarification on requirements the code won’t work without you explaining in depth the algorithm.

Although, that’s just a personal anecdote.

> However, the moment I ask a real question that goes beyond summarizing something which is covered thousands of times online, I am immediately let down.

I'm very sure I said this from the start, against the ridiculous hype. Summarization of existing text is the *only* safe use case for LLMs. Anything else is asking for disappointment.

We have already seen it used as a search engine and it confidently hallucinates incorrect information. We have seen it pretend to be a medical professional or a replacement attorney or lawyer and it has outright regurgitated nonsensical and dangerous advice - making itself completely unreliable for that use-case especially since (deep) neural networks in general are still the same black-boxes, unable to explain and reason about their own decisions; making them unsuitable for high risk applications and use-cases.

As for writing code, despite what the hype-squad tells you both GPT-4 and ChatGPT the ground reality is that it generates broken code from the start and cannot reason why it did that in the first place. Non-programmers wouldn't question its output where as an experienced professional would catch its errors immediately.

Due to its untrustworthiness, it means than now programmers have to check and review the output that has been generated by GPT-4 and ChatGPT every-time in their projects than before.

The AI LLM hype has only further exposed its limitations.

For a significant number of software developers, GPT and Github's Copilot have replaced StackOverflow, and even Googling more generally. It is more than an autocomplete, it is the best resource for software development by far, IMO. It's a tutor that's an expert in virtually every topic.
Yea it's not. Sorry to contradict, but it's not like that. In any kind of tutoring arrangement you're time with them is limited, and if they're any good, they don't just regurgitate limitless example code. Two of the most important decisions that an instructor has to make are, how much access to give you, and how much example material to give you, because the actual learning begins when you have to think for yourself, and you are forced to confront a black screen with a flashing cursor, and fill it with your own ideas. So interacting with ChatGPT may be a great experience, but it's not that. Maybe someday it will be.
Tutoring assumes the skill is valuable to learn, that there is a need for more people who know how to do it.

We don't really tutor people how to write too much assembly anymore, or hand-compile code. So if you're arguing that ChatGPT meets the definition of a tool, or a servant, better than a tutor, fair, but if you're further arguing that that makes it somehow less valuable than a tutor (in this case), I'm not sure I can come along there.

Yea, I definitely wasn't trying to quantify it's value. ChatGPT definitely appears to be proving valuable to people. I was just challenging the idea that so far it's acting in a tutor/instructor/mentor type of role. While it seems like an interesting direction to take these LLMs, so far I haven't observed them doing that.
It really isn't. GPT-4 is certainly an improvement over previous language models, but when I vaingloriously gave it the questions from favourite self-answers on StackOverflow, only one completion was immediately correct. The remainder were variously suboptimal, poorly crafted, overdesigned, incomplete, or downright wrong, requiring multiple re-prompts to coax into usable condition. The they were all syntactically valid but tended to misconstrue the semantics and underestimate the capabilities of the programming environments concerned. Try it with your own, but to me it's more like coaching a bright but inexperienced junior developer with the "confidently incorrect" trait.
I have to completely disagree with this.

Where GPT-4 shines for me is when I have a project swimming around in my head that I want to work on for fun. It can get you off of the ground quickly, and for side projects the quality and correctness of the output isn't that important.

For professional software development, GPT-4 is still wrong way too often for me to feel comfortable using it. And it's not all that much faster than going straight to the source anyways.

When people just ask chatGPT for solutions and there's no community, a la stack overflow, where will it get the answers to future problems?

If chatGPT is too successful and people stop producing content because chatGPT is too successful, it might end up in a local optima that isn't so optimal.

Well, the difference is coming from both directions. ChatGPT is pretty amazing, but Stack Overflow has been self destructing for many years ahead of this.

Likely the future of training these systems will come from interacting with their users and perhaps directly with the tools and compilers too. They can learn from that without needing a new corpus of human-human interactions.

No, even expert tutors know how to say “I don’t know” in the face of uncertainty, instead of remorselessly spitting up nonsense as language models do.
I don't agree.

I still use stack overflow regularly as an engineer.

Sometimes GPT-4 will have a quicker tailor-fit answer, but sometimes it will flounder as well.

Expert as of 2021, which is obsolete for many software dev purposes, not that SO is much better.
Similarly, when I think of ChatGPT as a really cool and advanced search engine frontend, its behavior - including its limitations and its failures - make the most sense to me.
It's a language model, not a search engine. It doesn't work well as one unless integrated into an actual search engine, like Bing does. Without such integration, it's much closer to human memory than search engine - it will recall stuff it has seen many times pretty well and completely fail at stuff it just glanced over once, filling any gaps with made up stuff like a kid on an exam hoping to get at least a few points with their wild guesses.
Yeah, I think we're talking about different things (and per my comment, I didn't say that it was a search engine). I'm reasonably well aware of what it is and what it's made of; I'm talking about a mental model for understanding and predicting when and why it works well vs when it doesn't.

And what I've found so far is that when I place it in the same mental bucket as the interface to a modern search engine (not the search engine itself, but the interface for both input and output), it actually fits in pretty well there in many ways. Not in every way, of course, but things like the nuances of crafting prompts and how a scarcity or abundance of reference material affects its output.

> I'm talking about a mental model for understanding and predicting when and why it works well vs when it doesn't.

I'm talking about it too. If I enter a specific phrase into a search engine that can be only found on a handful of websites, I expect it to return those results to me. Like, typing the VAT ID of my company will return bunch of information about it on various sites. This is absolutely not going to work with a LLM - instead, at best it may notice that what you typed looks like a VAT ID and will then proceed to give you information about a company it completely made up. The mental model of understanding what works with LLMs and doesn't is drastically different from a search engine. Human memory on steroids is a much better (though of course still not perfect) model.

Again, we seem to be talking past each other, sorry. I'm really, really, really not talking about the search engine itself. I'm talking about the hunk of tech that makes up the interface layer between the human and the search engine, and the fact that that hunk of tech can be hooked up to a search engine is interesting but not entirely germane.

If using the analogy of human memory works for you - that's great! To me, it's not as good a fit, but that's ok.

> The mental model of understanding what works with LLMs and doesn't is drastically different from a search engine

Agreed! But again, that's not what I'm talking about. :)

> I'm talking about a mental model for understanding and predicting when and why it works well vs when it doesn't.

That's what you said earlier you were talking about, and that's what I replied to. Now you're saying that you're in fact not talking about "the mental model of understanding what works with LLMs and doesn't" at all. Seems you have to improve your communication skills mate ;]

What I'm saying is that using LLMs while imagining them to be kinda like search engines is just a way to get burned by hallucinations and disappointed with poor results. They don't work even remotely similar to search engines, neither internally nor for an external observer. For some kinds of input they may trick you into believing they actually do, but that impression will fall apart pretty quickly once you try to actually exercise it. That's how you get people who are genuinely shocked that ChatGPT gave them references to papers that were completely made up, for example - which is something that shouldn't surprise anyone using this tech at all, as that's just how it works.

> a really cool and advanced search engine frontend

This is the saddest version of ChatGPT I can imagine. I found that as search engines emulated natural language, their results got steadily worse.

I just want the Google results and interface from a long time ago.

> I found that as search engines emulated natural language, their results got steadily worse

I would wager that that has not been the experience for the general population (read: non-technical people) and/or that degradation of results has not been because of emulating natural language but because of other factors (like advertising dollars).

Search engines have become incredibly more accessible for non-techies during the past 3 decades. Sure, even today a techie is usually able to coax higher quality results out of a search engine, but it's still a pretty recent advancement that an average Joe can just announce their question out loud and a device on the shelf will not only figure out what they are asking with a decent degree of accuracy, but it will also go search for something relevant, extract an answer, and then speak it back to the user in a pretty sensible way.

It is in this senses in particular that ChatGPT feels like a natural progression for search engines.

I completely agree that my experience has not been the same as the the general population's. But that doesn't really help me. My searches are still worse. Just find me pages that match the text I specify please. Add some boolean operators and I'm happy.

And because the majority of people have a better experience, I dismiss your second option of other factors being at play.

> And because the majority of people have a better experience, I dismiss your second option of other factors being at play.

That's fine, though the point I was (clumsily?) trying to make was that there are different factors here that allow multiple things to be true at the same time: power users routinely feel like search result quality is going down, and I think you can pretty objectively show that to be true in many cases.

Simultaneously, though, the barriers for "normal" people to do decent searches have dropped dramatically - there was an accessibility hurdle that was previously challenging for a lot of people and it's incredibly better now vs just a few years ago. This too, I believe, can be shown to be objectively true in many cases. (anecdotally as well - just last week I watched a number of very un-technical senior citizens get what they wanted out of Google and I didn't see much evidence that it was because of their skill at crafting good search queries).

> I am immediately let down

Why? I'm not sure how could you expect anything else in the first place.