The first steam engines were also written off as being less powerful than a horse.
The first electric motors were written off as being less powerful than steam engines.
So it goes.
I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.
Early adopters will use the technology and do amazing things with it. Unions are already pushing back on AI (truckers, federal employees in Canada, writers in Hollywood) and maybe rightly so. At the same time, dismissing these technologies because they don't meet your high standards yet is probably foolish.
The first segways were written off... and now you have electric scooters on demand on every city of the planet.
The first NFTs were written off... by you and other writter-offers. If you didn't write them off back then and minted some you'd have made a pretty penny by now.
I don't know, I'm not sure those examples strengthen your case.
Electric scooters and Segways are hardly the same thing.
Plus, I doubt the Segway came before either the electric scooter(and electric bike).
And that some people made money on NFTs doesn't make them useful. For a brief moment, a lot of people got scammed so others could prosper. Hardly a world changing technology(at least not for the better)
> Unions are already pushing back on AI (truckers, federal employees in Canada, writers in Hollywood) and maybe rightly so.
Interesting bit of historical trivia: In the US, the main truck driver's union is the International Brotherhood of Teamsters. What is a teamster? Historically, it was a person who wrangled a team of horses or oxen to pull a wagon. That profession was effectively eliminated by the creation of the internal combustion engine.
Two of the other main trades that were involved in the rise of unions are longshoremen and stevedores. The former pull cargo off ships and get it onto land. The latter organize cargo on the ships.
The creation of shipping containers dramatically reduced the need for those jobs, leading to some of the longest strikes in the US. While the unions technically "won", the strikes mostly incentivizing shipping companies to push even farther into mechanization so that they were less reliant on labor. There are far fewer dockworkers and longshoremen today than there were before containerization.
More on the port workers: today you'll find that skilled union port workers are extremely well-paid. Workers in the cranes that pull containers off the ship, for example, are paid >$300k.
They are quite skilled, but so are many other skilled tradespeople. Electricians and machinists for example do not make that kind of money despite their skill. So why do crane operators make so much?
When containerization, and subsequently other types of automation, hit ports, the union resisted fiercely but ultimately had to work out a deal with the port operators.
That deal was to reward the most tenured union tradespeople with much larger pay packages, at the cost of the less experienced tradespeople, who would have to find different work.
This agreement became agreeable to both sides, since the operators could still massively reduce the workforce, and the only price they'd have to pay would be high salaries for the union workers who remained. And the union was able to reward their longest tenured members.
It's hard to fault the union for this: the alternative was likely both huge reduction in workforce and less attractive pay, so they at least got good pay out of it for the remaining workers. But it made a lot of the less tenured union workers resentful because they felt the union sacrificed them in favor of the union 'insiders'.
The book 'The Box' by Marc Levinson is a great coverage of this topic, if steel shipping containers are the sort of thing that get you going.
I use it almost on a daily basis now, and I pay for the monthly subscription. It’s not magic, but it can save me a lot of time sometimes. I use it in my job as a software engineer, and I mostly use it to create unit tests. The code usually needs a lot of love, but it gets me started.
Like bears in space? ))) seems like a great way to delve into known topics so that you are able to catch it when it starts to confidently and plausibly hallucinate
Basically I described the problem I am trying to solve; I ask it to behave as a seasoned professor of comp sci or w/e depending on my hunch, we go back and forth where I ask questions, add constraints to the system, and in turn it gives me ideas into what I should look into more.
It's the Socratic method on steroids, where the student is probing a possibly fallible professor with the library of Alexandria at their hands.
It is by no means perfect, but it allows me to identify what general field I am going into, what I should read about it, what core concepts are important, how to expand my knowledge, and most importantly, helps me better formulate the problem by asking questions or failing to understand what I want to convey.
And we should blindly trust anything you read on the internet written by a real person? It's still valuable as a "search engine" with a different interface, especially if you can describe the problem but don't know the words to search for. For me that's a common issue when jumping into a new space.
no, you should not blindly trust anything you read on the internet, why did you think that? If anything, ChatGPT is actually doubling down on this "i'm the single authority" mode.
google, while far from perfection, gives me a selection of links that i can scan and see for myself which one of them makes more sense.
even looking for something stupid and potentially spammy like "chili soup recipe", i learned there are different opinions on how to make it, and i learned that apparently in Texas using beans is considered somewhat of a blasphemy. GPT did not mention any of that, just authoritatively barfed out one random recipe without any nuance or even any sources at all.
ChatGPT uses the word "delve" a lot. If it wasn't for the lowercase "i" in your message, I would have been skeptical that this was a message generated by chatGPT. Perhaps your speech tendencies are merging with those of ChatGPT. Before too long your independent thoughts and writings may accidentally be flagged by GPT detectors, haha.
“Admission to the private beta for GitHub Copilot chat is limited and requires an active subscription to GitHub Copilot. Signing up does not guarantee access.“
Curious if you've queried ChatGPT on topics that are not new to you and been impressed with its conveyance of relevant information?
I think there's value here, but I also think people are way overestimating the quality of the "knowledge" they think they're receiving from it on novel topics.
The topics that are not new to me and are of interest come up after its training-dataset-cutoff, thus it can produce bogus information.
I think ChatGPT and the likes should be thought of as cartographers who when probed can provide an outline of regions of the knowledge space and give hints as to what could be useful.
Given then a high level but also imprecise map, I can ask the GPS for clarity as to where I am going.
The problem is that I often don't know where that is and thus I don't know what to ask from the GPS, but I can describe it to the cartographer, and they can draw a map for me.
In this analogy, the GPS is search engines / books / papers.
No, it's objectively shit for a lot of things. The more technical and abstract something is in its concepts, the worse it becomes. It is just an LLM and that means it is inappropriate by its very nature for most things.
> I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.
Yes, because people think that LLMs are almost AGI based on the social media reactions and can't imagine they still have unknown/unsolved problems. But if we take a look at the 14 years of self driving car development, it becomes clear how AI can be both amazing and not good enough at the same time.
> Yes, because people think that LLMs are almost AGI....
Surprise, surprise... this has happened before:
> Lay responses to ELIZA were disturbing to Weizenbaum and motivated him to write his book Computer Power and Human Reason: From Judgment to Calculation, in which he explains the limits of computers, as he wants to make clear his opinion that the anthropomorphic views of computers are just a reduction of the human being and any life form for that matter.[29] In the independent documentary film Plug & Pray (2010) Weizenbaum said that only people who misunderstood ELIZA called it a sensation.[30]
And, it's easy to see why. You can talk the damn thing, and it talks back! People love to anthropomorphize things, anyway, but if you can talk to it and it talks back, people think there's got to be something to it.
This time, though, is a little different. GPT-3 and GPT-4 actually do behave like they understand natural language to a great extent. That makes them directly analogous to Searle's Chinese room construct, and suggests that they could actually pass the Turing test (if suitably fine-tuned).
This is great, because, as you say, it's amazing. But I also think it's not good enough, because the fact that GPT-4 may be able to pass the Turing test really says more to me about the limitations of the Turing test than anything else. Likewise with the Chinese room analogy: we know what's in the box, and we know it shouldn't be trusted.
But, you're not going to get that kind of analysis from the general public.
LLMs are different from self driving cars in that they can be useful even when they make wrong decisions occasionally. Copilots, document drafting (legal, copy, etc) and summarisation are useful services that people and enterprises are currently enjoying.
AVs have also struggled with regulatory muddle, which is partly my point.
Self-driving is very possible in many situations and if there was a "Manhattan Project" for self-driving to be up and running by 2025 I think we could do it... But there are so many vested interests that this won't happen.
... and then everyone is disappointed.
BTW, I'm not saying this is all bad... Everyone asking for a 6-month AI research moratorium gets it indirectly via societal inertia and regulatory muddle!
The difference here is that steam engines were "less powerful than a horse" in an easily quantifiable, easily diagnosable way. They produced fewer newtons of force. You could tell this was the case because your mechanism just wouldn't move when you wanted it to. Most new technologies followed this pattern, they quantifiably underperformed alternatives until the field matured. But AI doesn't act like a dumb human who's missing information or is inept at the task presented to them. It doesn't refuse to answer if you give it a question that's too hard for it or requires info that it doesn't have in the training dataset, it confidently makes stuff up and then covers up for the fact that it made stuff up by burying it in marketing copy and extraneous info such that you need to be an expert in the topic you're using AI for to even tell that it failed. Better AI models do help with this, but they simultaneously improve the AI's obfuscation abilities to the point where fatal flaws in its output are going to be even harder to catch than they are now with human review. It doesn't have the same risk calculus as a human, it doesn't care whether the marketing copy you're writing describes your product as wonderful and perfect or if it's providing you completely bogus legal advice that'll land you in jail for a decade if you follow it.
And this is all before we even bring up the topic of prompt injection, a problem so intrinsic to the technology that OpenAI doesn't even take bug reports on it because bug reports "are for problems that can be fixed".
One of the biggest problems in AI for the last 60 years is the grounding problem. The ability of a model to be rooted in objective reality. In other words, for one of these LLMs to understand when they are being accurate vs hallucinating. None of the current crop of LLMs has come close to solving this problem. On the contrary, they make the problem blatantly obvious. No LLMs will achieve AGI until this is solved sufficiently that the answers of an LLM can be depended on without complete independent secondary verification.
In my understanding, LLMs already hit a big wall. We can't increase the size of models mainly because it's too expensive, but also doing so may not be as effective as before. We've also run out of data. The free lunch is likely already over, for now. It's unlikely that we'll see huge improvements in the direction we've seen during recent years.
Instead, what I see is that the first letter 'L' is getting smaller. People are working on (relatively) smaller specialized models. But it means these models are unlikely outperform larger LLMs (in the direction mentioned above).
He wasn't wrong that it has "no wireless" and less space than a Nomad (whatever that is). He was absolutely wrong that it was therefore "lame" i.e. unable to deliver significant value in spite of these limitations.
A more modern example is the first iPhone, the first gen was bad even by the standards of the day. If you look passed the novelty of having a lightsaber app on your phone it was terrible.
I had a Motorola Q at the time and the first iPhone was light years beyond it, even if the only metric used to compare was browsing the internet. Most sites were barely functional in Windows Mobile IE.
But it had no apps, the analogy works, the iPhone did some things great (web browser:rewrite X in the style of y or whatever) and had some gotchas that seemed like a big deal at the time but were then resolved, or everyone realised it was so good they didn't matter (maybe no keyboard: hallucinations, we thought this was a problem but it's not, and no apps:no knowledge of current events, easy fix)
Browsers in other phones sucked. I'd owned high-end "smart" phones before the first iPhone and ultimately went back to a flip phone because they just weren't worth using. Usability of the first iPhone browser was a huge leap forward.
Ironically, that was largely because it let you more easily use sites built for desktop, due to the larger screen space and ease of pinch to zoom. Those older phones would have been somewhat more useful if mobile first/responsive sites had been a thing then, but it took the popularity of the iPhone for that to happen.
At the time you pretty much had palm and wince and feature phones. Of the three wince was probably the better of the interfaces (but that was a mater of taste). Data was expensive to buy on most carriers. Which all the other carriers mimicked within a couple of months of iPhone coming out. The iPhone was decently better than the other two and the 'unlimited data plan' and the bling bling of 'apple'. Also the browser being worth anything. The built in ones for all the others were junk.
Then people started sideloading and basically showed Apple they needed a store which they quickly came up with. Getting an application on the other two platforms at the time was mind numbingly bad (activesync was to put it mildly awful to use). In some cases you needed to get the carrier involved (better have a few months to validate and a few hundred thousand dollars to pay for it).
Also that screen they used was way better than what any other phone out there had at the time. Most of the top end phones needed a stylus and itty bitty keyboard to be any sort of useful.
I would say it was not until the droidx came out that anyone had anything that approached how cool the iphone was.
Mobile browsers were a painful experience before that.
Mobile keyboards were a painful experience before that.
I think Blackberry was the only one that did both OK enough to take seriously. I know people loved the Sidekick, but I never used it and don't recall if people used a web browser on it, or just text messaging.
The first iPhone was more impressive than a Blackberry.
Most feature phone browsers were very limited compared to what Safari supported (almost the full web experience at the time). You'd have to look at other "smartphone" class devices for something comparable, and those were not common (EDIT: among consumers) in the US at the time.
You are completely ignoring the reality that Apple products are considered better than peer products even if they are objectively equal. The public doesn't care if Apple wasn't the first company to put a web browser on a phone. The public knows that they like their iPod, and the marketing for the iPhone made it compelling.
> the marketing for the iPhone made it compelling.
That's part of my point. There was a tremendous amount of hype for something that was incredibly limited and objectively bad. However eventually it became very good and took over the world.
I'm not sure "bad" is fair. But the app ecosystem wasn't really developed, the network connectivity wasn't great, and there were probably a lot of other shortcomings especially in retrospect. I had a Treo at the time and didn't upgrade for a few years to the 3GS which, as I recall, was when the iPhone really took off.
The iPod had a somewhat similar trajectory. The first gen version was pretty much just another MP3 player and iTunes didn't even run on Windows at first.
No, because next word predictors are fundamentally limited in their capabilites and have interest problems. This isn't something you can just iterate on to fix. You need a different architecture.
the worst was it was ATT only and they had a coverage hole on the block where I lived so I had to get rid of it for something supported by verizon. didn't go back to an iphone until 10 years later.
> The first steam engines were also written off as being less powerful than a horse.
Which steam engine do you mean and do you have a citation for this comment? The first industrial steam engines were based on the Newcomen design and were used to pump water out of mines. Their big drawback was efficiency, not power. They were only economical in coal mines, which had fuel immediately available at near zero cost.
> I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.
This is pretty much what I've seen. ChatGPT (that's 3.5, right? GPT3 was interesting but still pretty laughable) was a massive step forward, and incredibly exciting to witness and interact with. But it still does have limitations, especially if you try to separate hype (which comes from an ecosystem of people who have incentives to hype it) from reality.
> The first steam engines were also written off as being less powerful than a horse.
This may not be the best example considering that steam engines were around since at least 20BCE[0] but the first successful application wasn't till almost 1700.
Large Language Models aren't a silver bullet – they don't solve all your problems. But they are a holy grail – as a universal common sense module they give IT systems a capability they never had before, a capability which has been sought after from since computers became a thing, a capacity for common sense.
We now have that capacity and that alone will revolutionize the world. The chatbots aren't about chat, they are about common sense.
Like the article, I am only talking about technology that already exists although the progress in deep learning is still super-exponential.
We will certainly achieve AGI during this year as it will only require making these systems self-play like we did with AlphaGo -> AlphaZero -> MuZero. Self-play, or reinforcement learning with machine feedback will skyrocket the performance of these systems in language domain, which conveniently encompasses much of what is still missing for AGI.
You've got it exactly backwards. They're not about common sense; they're about chat.
LLMs act in insensible ways all the time. They contradict themselves. They hallucinate. If you ask an LLM to follow a simple but long logic puzzle and show you its work, it will often make extremely obvious errors and fail to notice even when you ask it to review its work.
What LLMs can do is coherently string together language. That requires sophisticated linguistic understanding, and LLMs are pretty impressive for it. But merely understanding language is not intelligence, or even common sense. I suspect we're approaching the limits of what we can get out of statistical language generation.
Actual AGI would require a logical model of the world, not a probabilistic model of language. Looking at non-human animals, we can see problem-solving evolved long before language did. Language is a second-order phenomenon we use to express first-order problem-solving conclusions about the world, and I'm skeptical that we'll ever manage to accurately recreate first-order problem-solving by training models on the second-order linguistic artefacts of problem-solving. It's actually very easy to create superficially-realistic second-order problem-solving artefacts which, when examined with first-order problem solving capabilities, don't stand up to scrutiny—i.e. contradictions, hallucinations, and faulty reasoning. I suspect that, when computers do learn to problem-solve, it will have been by training to solve problems.
There's also no way we'll get the kind of growth you're predicting from self-play. When it comes to a simple competition like a board game, it's easy to optimize for more skilled play. But there's no objective way to win a conversation. Maybe we'll get some gains out of training these systems against each other, but it's just not a clearly viable use case.
> Like the article, I am only talking about technology that already exists although the progress in deep learning is still super-exponential.
> We will certainly achieve AGI during this year as it will only require making these systems self-play like we did with AlphaGo -> AlphaZero -> MuZero. Self-play, or reinforcement learning with machine feedback will skyrocket the performance of these systems in language domain, which conveniently encompasses much of what is still missing for AGI.
There’s an important difference between exponential and sigmoidal curves. The early stages are indistinguishable, and not enough time has passed to judge.
Personally, I don’t think AGI is possible with current techniques. You say all that’s needed is self play or RLHF. This is categorically not true. It doesn’t even guarantee that AIs will ever care whether they’re alive, a fundamental property of sentience.
There is likely a definitional gap here. Sentience is unnecessary for intelligence for my definition of intelligence, but agreeing on a common definition has been tough when we understand it so poorly.
I happen to also disagree with "caring" (requires definition) being relevant to sentience, defined as the
ability to perceive or feel things.
I’m sympathetic to these arguments, but AGI to me is Data from Star Trek. I think most people would agree.
He has curiosity. The current gen of AIs don’t. They don’t even ask questions, let alone remember anything.
He has a capacity to get bored. He tries out guitar just because he wants to. He paints. He’s frustrated when the details aren’t right.
A lot of these traits are human. But that’s the whole point — we’re trying to make a wo/man in a machine.
I’ve never understood the hype, and I’m a researcher. It seems to me that there is a vast gulf between what AIs are capable of and anything that makes being human, human.
I believe they’ll get progressively better at intellectual tasks, though. That will be really disruptive.
As exciting and transformative as GPT3+ is, let's not get too hypey.
You need to back up outlandish claims with actual evidence and references. As discussed many times in this forum there's no evidence of sentience or any reason to consider the current systems to be even on the path to AGI.
Uh, nope. Being able to spur out text is far from understanding what common sense is. If it did have the common sense, why would OpenAI struggle so much with filtering? Because the model doesn't comprehend what it generates. It's only capable of interpolate textual data it witnessed. The sense of common sense is merely an illusion created by the brain, which also loves interpolating whatever there are.
It's pretty straightforward to build an RL environment for closed systems like chess but I don't think it's close enough for an AGI to learn. Like RLHF uses human feedback. Unless we come up with a way to scale that process AGI by this year doesn't seem possible
Wait until they can watch TV to learn (I am serious). If you imbue them with competitive play, oh boy. We just gotta figure out what they think funny is.
I'm surprised this article is getting upvoted - it feels like very lazy journalism to me.
> The discomforting reality is that, while Altman and his ilk have been predicting an exponential acceleration of productivity, we have been experiencing a deceleration.
This is a very big claim, and there is absolutely nothing to back it up. The only specific reference to productivity is about an MIT paper that showed increases in worker productivity (but the authors of this just wave that aside as unimportant because they didn't think the work it was doing was important).
> More dangerously, ChatGPT can make authoritative statements that sound believable but turn out to be false if investigated closely.
We get it! We know! But look, this is a bad use case for GPT. If you pretend that it only has a single use case, and you pick the use case that it's worst at, you will think it's bad. This is just so, so lazy. No references to summarizing docs or writing code/SQL queries/Excel formulas or any of the other things that it's genuinely useful at.
> At best, LLMs can be used for rough first drafts of low-value writing tasks with humans filling in the details and checking for rants and lies.
Rants? Come on - GPT hallucinates, but it's not an unhinged lunatic that goes ranting about stuff. Also, again, this is not all they can be used for - it just ignores all of the better use cases.
> What about Altman's vision of humans appreciating art and nature while most of the world's goods and services are produced by AI? We have a lot more respect for the work that people do than for the usefulness of LLMs.
Huh? It's great that you respect the work people do, but that has nothing to do with whether they'll affect society.
> ChatGPT is entertaining but it is, at most, a baby step towards an AI revolution and, at worst, a very expensive detour away from the holy grail of artificial general intelligence.
What? This is the closing to the article and it just throws out this enormous claim, which is backed up by absolutely nothing. It's demonstrably a big step towards an AI revolution - if nothing else, it's brought a ton of money and interest into the space, which is certainly important for a revolution.
But to say it's a detour away from AGI and then give absolutely no explanation of why that is or what direction AI research should be going? This is very poor journalism.
It's really easy to get an LLM to hallucinate by asking an open ended question - the type typically answered by a Google search or checking Wikiedpia. However, this is not the best application of LLMs. This criticism is getting old.
LLMs are great at:
- Text synthesis given all of the facts in a prompt (expand these bullet points)
- Summarization (condense this text)
- Data extraction (fit this data into this schema)
It seems to do ok at coming up with starting points or give you options if you're stuck. But the quality of the prose it comes up with is indeed awful. It gets a bit better if you ask it to write in the style of a specific author, but marginally so.
I guess maybe it gets to mediocre fan fiction level.
That's still pretty impressive, but not very usable for creative writing yet.
I wonder if there's a series of steps(prompts) that could be used to get it to out put something much better. I know I've used it to write something I would have never even attempted let alone tried to write on my own. It came out ok, but better than I could have done on my own.
Some do. Others will just sit down and write. But the problem isn't so much that it can't handle plot. That is amenable to process - there are huge numbers of different processes, and one that might work well for GPT is something called the Snowflake method, which is basically iterative refinement. E.g. start with a one line description, expand it to a paragraph, expand each paragraph to a paragraph, then to a page or a few, and eventually to a list of scenes, and write out the scenes. Oversimplified (there's some steps with character sheets etc. too).
For that it might well be useful, because you could do one iteration at a time, edit the output to keep/reject ideas and do the next step.
But the challenge is that while it might not be "easy", it's the less time consuming part of a novel (certainly has been for me). The time consuming part is writing out the scenes, and the part GPT so far is awful at is the prose. So even if you manage to get it to produce a coherent script setting out what should happen, you still (so far) will have to expect to rewrite the entire thing anyway. That may or may not be useful to you. For my part I suspect I'd write faster from scratch than trying to edit and keep it consistent.
That said, given how far it's gotten I wouldn't at all be surprised if it can get to reasonable prose in another couple of versions.
It is good at inferring the correct people into a story. But the story many times leaves something to be desired.
Other times though I did have a lot of fun having it spit out SCP stories. As those can many times have a ton of template like logic to them. Due to the nature of SCP being written in a tone of a formal report. Plus well over 2000 different examples.
Also some of that could be due to lack of training data. Like a TV show might be 4 seasons long and a particular character may have had 3 or 4 lines total. It would be like asking it to write a story about Boba Fett given the original 2 movies where he showed up and had maybe 1 or two lines. There just is not enough to extrapolate anything. But you ask it to write something about Harry Potter and it probably could get the style close enough as there is more training data.
My biggest grip is sometimes it just gets stuck in a loop. Once you are in one, the thing just will dump out the same hallucinations over and over.
Try Anthropic's Claude[1]. I've found it to be better at creative writing than GPT4 or even Claude+.
That said, it's still not great, though sometimes you can luck on to finding a gem in what it writes.
I've also had luck in giving it examples of the sort of thing I wanted it to write and asking it to write something similar, but with certain modifications that I wanted it to make.
Giving two or more examples and asking it to combine them is also fun.
I wonder if an LLM trained on your favorite author how many words/sentences paragraphs it could generate in the middle of a book that would be basically undetectable.
You don't even necessarily need to train it specifically on their writing. Just giving an LLM an example of the sort of writing you want and asking it to write something similar is sometimes enough.
But, yeah, training it specifically on a corpus of work would probably be even more effective.
I'd love to be able to do that and get output that's at least on the GPT4 level. I think we'd probably have to have a breakthrough in LLM architecture and/or some amazing advancements in hardware before it becomes practical and cost effective for individuals to train their own GPT4-level LLMs, though.
I've built internal systems that do summarization based on knowledge retrieval systems for specific nonpublic corporate information.
With GPT-4, I find very little hallucinating. It very rarely deviates from the source material. Every time I've found something unexpected, there was a problem in the source material provided to the model.
To be fair, the ones I've seen use a form of point 1 (giving all facts in the prompt) by allowing for searching the web, which becomes a version of point 2 (summarization).
Regarding the last point: What I still find the most entertaining is how easily you can change its personality, especially via the system prompt. You can get it to be rather snarky, even sometimes insulting, which makes for hilarious IRC bots.
As the hype phase has passed (probably), now we will see a bit of overcorrection with these dismissive articles. Sure, LLMs as they are now aren't anywhere close to true AGI and even Microsoft admitted it. But its potential is not something anyone can ignore. The capabilities of LLMs has already been successfully used by millions of people and startups. It is a groundbreaking improvement that makes at least one field of study nearly obsolete (NLP). It captured attentions of both corporations and government who are pouring billions into it. All of this in the span of one year or less.
With the multimodal models coming next and still exabytes of videos, games, sound, musics, etc. data to train them, we aren't peaking yet. Sure, it isn't the holy grail. But it is a really valuable treasure that only a few exist, to use the same analogy. To view it so dismissively because of some drawbacks, which are entirely obvious and can be accounted for, is just arrogance.
> It captured attentions of both corporations and government who are pouring billions into it. All of this in the span of one year or less.
Corporations and governments have thrown tons of money into technologies that ended up going nowhere. We're only a few years out from everyone dumping their money into "blockchain solutions", which turned out to go nowhere.
Investors and government stakeholders are easily swayed by hype. Sometimes this hype is well placed, but often the hype results in throwing money at projects that don't produce anything of value. Hype just isn't a good measure of a technology's long term viability.
When only a few of them followed the hype, yes, it can possibly go nowhere.
But when the entire industry, experts and non-experts included, are fascinated and obsessed with the same thing, it is more likely to be something real. An easy example is the first iPhone.
Another more negative example is bitcoin which even though it is probably a scam, its values and influence on society has massively grown more than what it was 1 year after released. Even though it has been a disappointment technologically.
> But when the entire industry, experts and non-experts included, are fascinated and obsessed with the same thing, it is more likely to be something real.
This is a perfect description of cryptocoins and similar technologies. I've witnessed literally illiterate people buying coins and selling the idea to others.
I know. I mentioned that crypto is a disappointment technologically. But it doesn't change the fact that it still brought massive profits and significantly impacted society. For worse but still... The point is with this much momentum behind a single tech, it will surge forward regardless of whether it has true merits that can live up to its hype or not.
Sure, anyone that uses ChatGPT knows it's currently not perfect.
But there's a presumption that these tools are going to keep improving over time. Which is presumably why the AI hype is so strong.
Whether AI ends up displacing people from their jobs in the long term, well, that's impossible to know. Just because no technological advancement has ever done that in the past doesn't mean it will never happen in the future.
As long as the accuracy of an LLM’s output is unknowable, there’s going to be a pretty hard limit on the kinds of jobs these tools can “replace”. And its not at all clear that this fundamental problem can be fixed at all with the current approach.
A tool doesn't have to obviate a worker's contributions 1:1 to replace them.
If one person can now do the work of 1.5 people, then the number of people needed for a profession shrinks, all else equal. For example, a professional translator may be able to do 2x the work by leveraging LLM/other AI, even though you still need them to validate the results. If productivity doubles, then only half the people are required to meet current needs.
The mistake is in believing that LLM's output should be deterministic to be useful.
Human output is not deterministic.
Fields with text-heavy output are already being upended by this. Being able to summarize long legal briefs, identify contract problems, do classification of discovery documents, or even write first drafts of common legal forms is already upending the legal discipline.
Chat-based customer support agents are seeing 25% productivity improvements based on two-year-old models for new employees, according to a study published in NBER.
Things like BabyAGI and other sequential "do anything" tools appear to be close to useless now, and unfortunately that is what is catching a lot of hype on Twitter. But actual industry applications are much quieter (often NDA) and much more impactful.
Not understanding why this is an issue for LLMs but not humans.
This is a simple commercial decision to make governed by three factors.
1. What is the cost of making an error?
2. What is the cost of the human doing the work?
3. What is the likelihood of the human making an error?
It's just evaluating how much more likely AI is to make an error than a human, by the cost of that error, set against the savings by using fewer humans.
Look at the legal profession. Sometimes the cost of an error is high, but usually it is not. There are already tons of little errors in contracts and discovery, and today they're all human. And people are very expensive. There is a giant swath of legal work that looks very attractive to automate at less than 100% accuracy.
Customer service: people offer poor customer service all the time, and usually the cost of that error is low. Human customer service isn't as expensive as legal work, but it's still relatively expensive. Very attractive to automate at less than 100% accuracy.
> Not understanding why this is an issue for LLMs but not humans.
Because humans have the capability to understand where their information comes from, and thus give enough meta-information to evaluate an accuracy rating, even if not all of them are good at it all the time.
I understand that there has been some effort to build this capability into LLMs, and that it works a little bit for some of them, but it is not something that most of them are fundamentally capable of.
Humans can make mistakes and lie, and we've been able to deal with it by checking their work, giving feedback to help them improve, placing less trust in those who habitually lie, etc..
LLMs making mistakes and "hallucinating" can be dealt with in similar ways, and as this is an open area of research with lots of proposed solutions and probably many more in the years to come, we do/will have plenty of other ways to deal with it too.
It’s not the first time we’ve been here , either, with AI although t this time it’s a bit more in the public , ie retail sphere. There are people who will confidently tell you that LLM are the next transistor level invention, and people who will tell you it’s more incremental, like eg an electric pressure cooker - improvement in some ways over what came before, got lots of people using them, but not fundamental. I’m sure there is a better example.
Anyway, the truth is nobody actually knows at this point .
This reminds of all the hype for self-driving cars a few years back. Self-driving systems performed well for 95% of driving, and it seemed like only a matter of time before the last 5% was ironed out.
Turns out, the last 5% was both extremely difficult, and extremely important. It turns out that a self driving car that randomly makes dangerous maneuvers isn't desirable. Similarly, a LLM that occasionally outputs plausible sounding bullshit quickly turns from a useful tool to something actively harmful.
As far as I understand, LLMs with 95% correct answers are much more useful than a car that doesn't crash 95% of times (if you need to pay attention to correct mistakes, you may well be driving).
A 95% correct LLM might be utter garbage in some areas but nearly flawless (thus reliable) in other, menial and time consuming tasks, such as summarization, rewording, providing new ideas, etc.
I think a lot of people on here are for some reason believers in the idea that if a technology has detractors, then it must be another case of the steam engine, human flight, or some other technology that had doubters before completely revolutionizing our world. In reality, there is no such law of the universe that says that some technology will be wildly successful because it is heavily controversial, and, in some cases, it turns out that a lot people were correct in predicting a technology's short/long term uselessness (crypto, web3, AR). Every time some article is posted highlighting AI's shortcomings in relation to its posited ubiquity in professional settings about 10 people wax poetic about how the internet/cars/etc. were doubted heavily, when they clearly are not similar in nearly any regard. I wish we could appreciate new technology without blowing its applications out of proportion and then being disappointed when it falls short of an impossible bar, which is my main gripe with both AI doomers and people who are entirely dismissive of the technology (despite basically nobody saying anything of the sort).
> I think a lot of people on here are for some reason believers in the idea that if a technology has detractors, then it must be another case of the steam engine, human flight, or some other technology that had doubters before completely revolutionizing our world.
I think that's a misinterpretation - I don't think that it's a revolutionary technology because it has detractors or because it's controversial; I think it's revolutionary because of its capabilities. Those comparisons just serve to point out that there are plenty of historical examples of people criticizing things that turned out to be revolutionary, and the same may well turn out to be the case here.
I tried two of the "failure mode" examples given: The Russian bears in space, and walking across a river with an average depth of 3 feet. The examples fail in GPT-3.5 while GPT-4 gives the "correct" response. The article is already out of date.
If that was your point, then you expressed it poorly. You wrote: "The article is already out of date." (emphasis added). That implies that the article was out of date as of the time you wrote your comment, or at some unspecified point prior to that, but, in fact a stronger and more specific claim is true, namely, that the article was out of date as of the time of its publication, approximately 2 months ago.
Checked SO, Apple Discussions, etc. Then, on a whim, I asked ChatGPT for a suggestion.
> They are also prone to confident assertions of statements that are blatantly false.
It suggested something that referenced a nonexistent property of a standard UIKit class. It wouldn't even compile.
It was quite positive that this would fix my issue.
After refreshing a couple of times (and also mentioning the first one sucked), it finally gave me something that still didn't work, but gave me an avenue that I could explore, and that finally yielded the solution.
I suspect that the reason for the confident assertion that included an illegal property, was because it was trained on Swift code that was extended (I do that a lot, myself. In fact, I ended up creating my own extension that added the nonexistent property).
Modern programming languages allow you to extend even language primitive types, and JavaScript has allowed that kind of thing for many years.
It may be a while before we can entirely trust ChatGPT to give us all the answers.
To be fair, however, it did help me to land upon the correct solution, but I still had to fire up some candlepower of my own.
It may be a while before we can entirely trust ChatGPT to give us all the answers.
I find the whole premise of ChatGPT -- or indeed, neural network systems in general -- to be untrustworthy. That does not mean useless! But more like a first approximation solution that still needs to be checked.
I expect that, with more and more training, GPT systems will make fewer and fewer errors, but I still find the fundamental concept something that needs to be double-checked through some other (possibly automated) means.
That's basically the boat I'm in. I really like it, it's a good bit of software, but it's like having an error-prone person to bounce ideas off of.
It doesn't always directly give me an answer I can use, but it's extraordinarily handy to be able to copypaste a large server log into it, ask what it means, and give suggestions based on that. Often just knowing the proper-nouns to look up can be immensely useful.
GPT4 is significantly better at code fyi, not sure if you used default ChatGPT (which is 3.5turbo). Premium ChatGPT gives you access to GPT4, but the API gives you even more access with the ability to edit the "system" prompt which Sam Altman's said is very important and which I can attest to from my testing.
I did and do try it with GPT4 and the API and a custom prompt to help me in rust.
Expecting only knowledge from 2021 it is hit and miss. Helped me to write some scripts but still fails to recognize tasks that are not possible in the language and halluciantes a plausible solution.
It tends to suggest non non compiling and (after being asked to do so) "corrects" bugs (some real some imaginary) and sometimes gets stuck.
Helpful yes, but not yet the invincible overlord some people imagine it to be.
There’s also the code interpreter functionality that’s in alpha access. Not everyone has access to it but it allows you to upload spreadsheets, code files, etc and then uses gpt4 to try and interpret and fix the code.
Using ChatGPT to get facts is like going to a restaurant to buy your groceries. Sure, you could pick out slices of tomato and scrape the salt off of your steak, bring it all home and cook a meal - but the supermarket next door can offer the same ingredients without the hassle.
Just as the point of a restaurant is to process and cook the ingredients for you, the point of ChatGPT is the preprocessing of facts and information.
If you just want fresh, unprepared facts, go to Google or Wikipedia; if you want an informational meal, go to ChatGPT.
This critique asserts that ChatGPT propagates inaccuracies. While there may be instances where this holds true, the given example does not substantiate this claim. The article alleges that ChatGPT stated Russia has launched bears into space, an assertion that is evidently false. However, my own interaction with ChatGPT 4.0 contradicts this. When I posed the same question, the AI unequivocally responded that no nation has, in fact, sent bears to space. Thus, in this instance, the claims of the article are unfounded.
This comment seems to be perpetuating another misunderstanding about how LLMs like ChatGPT work: the idea that they have a "model of the world" that is consistent, but sometimes incorrect.
They do not.
There is no reason why two different people, asking the same question of ChatGPT, would necessarily get the same inaccurate answer. There is also no reason why they would get similar correct answers.
What they will get is answers that are statistically likely based on the text in ChatGPT's training data.
Most articles like this are comparing older versions of chatgpt and pointing out flaws. ChatGPT-3.5 for example still states Russia has sent bears into space.
The author of the article judges the future of LLMs and AI using the current state of affairs, as is these technologies are not evolving very quickly. GPT3 and GPT4 are only 3 years apart, yet GPT4 is a completely different beast. And we are talking only about LLMs, we haven't yet seen what's coming in vision for example. Also I have not heard OpenAI or any serious AI visionary say ChatGPT is the holy grail of AI research. It is undoubtedly one of the best consumer ready technologies we have seen in the field to date but it just a tiny little piece of a massive infrastructure that is still in its very early stages. It's like saying HTML was the holy grail of the Internet. It was an important component of it but without fiber optics, microchips, radio, smartphones, algorithms, etc the Internet would not have been possible the way we know it today.
I'm trying to imagine a few next generations from now.
Then imagine full voice recognition and voice synthesis attached to it, on every phone, every watch, every car.
The news, the headlines, on every page you visit is going to be custom re-written by AI for every bit of data they can get on you, even your IP address if they have nothing else.
Any job not replaced by AI is going to be AI-assisted, it is inevitable.
This article is from March 2023, which in LLM terms is pretty old! The "How many bears have the Russians sent into space?" question still returns hallucinations with ChatGPT 3.5 but, unsurprisingly, gets a correct answer from GPT-4.
This feels like classic Hype Cycle content: the higher we peak on Inflated Expectations the lower we'll find ourselves on the Trough of Disillusionment.
The have completely no idea what they are writing about
But maybe this is a good part. We’ll have less uneducated journalists and more opinions of actual specialists. Making those articles sound fresh, interesting, and easy to read with the help of LLMs.
FFS, once more for those in the back since people who don't understand how logic works keep pushing this dumb "point" on this board for various things:
"Successful technology was once argued against" is not evidence that "any technology that is argued against will be successful".
That should not be a hard concept to grasp, yet it seems to elude so many. A -> B does not mean B -> A. That's literally like day two of any class on formal logic, and only because the first day was spent discussing the syllabus.
Thing is, there will always be morons. LLMs are just the latest thing to be moronic about. That's why I really don't appreciate these downer articles very much. I think LLMs are awesome and they've now become a part of my everyday life, but it's a little much to say they're the "holy grail."
At the same time, ChatGPT does deserve hype. It's an achievement that we've essentially invented the ship's computer from Star Trek, and have it available for free or at an affordable price to everyone. When people reduce ChatGPT down to little more than statistics, I don't think they actually understand the technology or cognition at even a basic level, and a lot of the critiques towards ChatGPT come off as butthurtedness.
The first electric motors were written off as being less powerful than steam engines.
So it goes.
I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.
Early adopters will use the technology and do amazing things with it. Unions are already pushing back on AI (truckers, federal employees in Canada, writers in Hollywood) and maybe rightly so. At the same time, dismissing these technologies because they don't meet your high standards yet is probably foolish.