Hacker News new | ask | show | jobs
by jyrkesh 1184 days ago
Everyone's been talking about how ChatGPT will disrupt search, but looking at the launch partners, I think this has the potential to completely subvert the OS / App Store layer. On some level, how much do I need an OpenTable app if I can use voice/text input and a multi-modal response that will ultimately book my reservation?

Not saying mobile's going away, but this could be the thing that does to mobile what mobile did to desktop.

14 comments

People said this about Alexa/Siri et al and it didn’t happen. ChatGPT is way better at understanding you, so that’s a big boost. It could be a great tool/assistant but it probably won’t replace apps.

The problem with those other platforms that this doesn’t address include:

- discoverability. How do you learn what features a service supports. On a GUI you can just see the buttons, but on a chat interface you have to ask and poke around conversationally.

- Cost/availability. While a service is server bound, it can go down and specifically for LLMs, the cost is high per request. Can you imagine it costing $0.1 a day per user to use an app? LLMs can’t run locally yet.

- Branding. Open table might want to protect their brand and wouldn’t want to be reduced to an API. It goes both ways - Alexa struggled with differentiating skills and user data from Amazon experiences.

- monetization. The conversational UI is a lot less convenient to include advertisements, so it’s a lot harder for traditionally free services to monetize.

Edit: plugins are still really cool! But probably won’t replace the OSes we know.

Good points - but I fundamentally disagree here.

The whole ecosystem, culture and metaphor of having a 'device' with 'apps' is to enable access to a range of solutions to your various problems.

This is all going to go away.

Yes, there will always be exceptions and sometimes you need the physical features of the device - like for taking photos.

Instead, you'll have one channel which can solve 95% of your issues - basically like having a personalised, on-call assistant for everyone on the planet.

Consider the friction when consumers grumble about streaming services fragmenting. They just want one. They don't want to subscribe to 5+.

In 10 years, kids will look back and wonder why on earth we used to have these 'phones' with dozens or hundreds of apps installed. 'Why would you do that? That is so much work? How do you know which you need to use?'

If there was one company worrying about change, I would think it would actually be Apple. The iPhone has long been a huge driver of sales and growth - as increasing performance requirements have pushed consumers to upgrade. Instead, I think the increasing relevance of AI tools will inverse this. Consumers will be looking for smaller, lighter, harder-wearing devices. Why do you need a 'phone' with more power? You just need to be able to speak to the AI.

An interface based on voice only has an issue: people tend to not live alone. As children they live with their parents. As adult, many want to live with a significant other.

Having somebody else in the house speaking out loud each time they want infos from the internet could become annoying.

Apart from having a mind reading device, I don't see so far a solution to this problem better than text input with a physical keyboard or a virtual keyboard on the device.

> Consider the friction when consumers grumble about streaming services fragmenting. They just want one. They don't want to subscribe to 5+.

I think you just proved it won't happen anytime soon.

Consumers obviously would prefer a "unified" interface. Yet we can't even get streaming services to all expose their libraries to a common UI - which is already built into Apple TV, fireTv, Roku, and Chromecast. Despite the failure of the streaming ecosystem to unify, you expect every other software service to unify the interfaces?

I think we'll see more features integrated into the operating system of devices, or integrated into the "Ecosystem" of our devices - first maps was an app, then a system app, now calling an uber is supported in-map, and now Siri can do it for you on an iPhone. But I think it's a long road to integrate this universally.

> If there was one company worrying about change, I would think it would actually be Apple.

I agree that apple has the most to lose. Google (+Assistant/Bard) has the best opportunity here (but they'll likely squander it). They can easily create wrappers around services and expose them through an assistant, and they already have great tech regarding this. The announcement of Duplex was supposed to be just that for traditional phone calls.

Apple also has a great opportunity to build it into their operating system, locally. Instead of leaning into an API-first assistant model, they could use an assistant to topically expose "widgets" or views into existing on-device apps. We already see bits of it in iMessages, on the Home Screen, share screen and my above Maps example. I think the "app" as a unit of distribution of code is a good one, and here to stay, and the best bet is for an assistant to hook into them and surface embedded snippets when needed. This preserves the app company's branding, UI, etc and free's apple from having to play favorite.

Edit: apple announcing LLM optimizations already indicates they want this to run on apple silicon not the cloud.

Great point about failure to unify (or intentionally preventing it).

The space is in a land-grab phase, where everyone wants to position themselves as the next Google, and control the channel.

Will be interesting to see how this all plays out.

> In 10 years, kids will look back and wonder why on earth we used to have these 'phones' with dozens or hundreds of apps installed. 'Why would you do that? That is so much work? How do you know which you need to use?'

Phones with apps have been around for 29 years. I'm calling BS on your prediction now.

I was thinking the same way, but here's where I could imagine things being different this time (Fully aware that I just like anyone else is just guessing about where we'll end up)

- Discoverability. I think we'll move into a situation where the AI will have the context to know what you will want to purchase. It'll read out the order and the specials and you just confirm or indicate that you'd like to browse more options. (In which case the Chat window could include an embedded catalogue of items)

- Cost/availability - With the amount of people working in this area, I don't think it'll be too long before we're able to get a lighter weight model that can run locally on most smart phones.

- Branding - This is a good point, but also, I imagine a brand is more likely to let itself get eaten, if the return will be a constant supply of customers.

- Monetization - The entire model will change, in the sense that AI platforms will revenue share with the platforms they integrate with to create a mutually beneficial relationship with the suppliers of content. (Since they can't exist without the content both existing and being relevant)

I spent a lot of time working on the product side in the Voice UI space, and therefore have a lot of opinions. I could totally end up with a wrong prediction, and my history may make me blind to changes, but I think a chat assistant is a great addition to a rich GUI for simple tasks.

> I think we'll move into a situation where the AI will have the context to know what you will want to purchase

My partner who lives in the same house as me can't figure out when we need toilet paper. I'm not holding my breath for an AI model that would need a massive and invasive amount of data to learn and keep up.

Also, Alexa tried to solve this on a smaller scale with the "by the way..." injections and it's extremely annoying. Thank about how many people use Alexa for basically timers and the weather and smart home. They're all tasks that are "one click" once you get in the GUI, and have no lists and minimal decisions... Timer: 10 min, weather: my house, bedroom light: off. These are cases where the UI necessarily embeds the critical action, and a user knows the full request state.

This is great for voice, because it allows the user to bypass the UI and get to the action. I used to work on a voice assistant and lists were the single worst thing we had to deal with because a customer has to go through the entire selection. ChatGPT has a completely different use case, where it's great for exploring a concept since the LLM can generate endlessly.

I think generative info assistants truly is the sweet spot for LLMs and chat.

> in the sense that AI platforms will revenue share with the platforms they integrate with to create a mutually beneficial relationship with the suppliers of content.

Like Google does with search results? (they don't)

Realistically, Alexa, Google Assistant, and Siri all failed to build out these relationships beyond apps. Companies like to simply sell their attention for ads, and taking a handout from the integrator requires either less money, or an expensive chat interface.

Most brands seem to want to monetize their own way, in control of themselves, and don't want to be a simple API.

> LLMs can’t run locally yet.

"Yet" is a big word here when it comes to the field as a whole. I got Alpaca-LoRA up and running on my desktop machine with a 3080 the other day and I'd say it's about 50% as good as ChatGPT 3.5 and fast enough to already be usable for most minor things ("summarize this text", etc) if only the available UIs were better.

I feel like we're not far off from the point where it'll be possible to buy something of ChatGPT 3.5 quality as a home hardware appliance that can then hook into a bunch of things.

Agree that Alpaca is Important. I got the smallest one running on a pathetic notebook… 2 cores, 8GB RAM. It was slow. It was sloppy. But it worked. Getting these things running on GPU/NPU will be very compelling, especially if we don’t hit a wall on compression. I think a sweet spot exists where consumer clients are powerful enough and models are small enough to deliver value and privacy.
I think you're missing the fact that the LLM could also generate the frontend on the fly by e.g. spitting out frontend code in a markup language like QML. What's a multi-activity Android app if not an elaborate notebook? Branding can just be a parameter.

Sure, maybe OpenTable would like to retain control. But they'll probably just use the AI API to implement that control and run the app.

Chat can be an interface, but its also essentially a universal programming language which can be put behind (or generate itself) any kind of interface.
Who's to say though that it'll always stay a text format.

They could bring in calendar, payment, other UI functionality...

Basically they could rethink how everything is done on the Web today.

It almost certainly won't take the form of a text format. Impersonating a chatbot or a search engine GUI is just the fastest way for OpenAI to accumulate a few hundred million users, to leave the competition for user data and metadata behind.
it would likely take the form of just in time software.
>The conversational UI is a lot less convenient to include advertisements

How so? Surely people are going to ask this thing for product recommendations, just recommend your sponsors.

This moves the advertisement opportunity to the chat owner. If you want to use chat (+api) to book a table at a restaurant, then the reservation-api company loses a change to advertise to you vs. if you used a dedicated reservation-web-app.
Oh I see what you mean, yes. The reservation api company will have to get money through other means (either from the user via OpenAI or from the restaurant).

Honestly I see this as a positive change, I'd rather be the customer than the product.

I’m already seeing advertisements in New Bing.
We have reached "peak UI". In the future we're not going to need every service to build four different versions of their app for every major platform. They can just build a barebones web app and the AI will use it for you, you'll never have to even see it.
IMO you won't even need to build the app, you'll just provide a data model and some natural language descriptions of what you want your product to do.
That’s how this plugin system works already.
I don't think this is the case. You provide an API spec but you also have to provide the implementation of that API. ChatGPT is basically a concierge between your API and the user.
I think the API is meant to be the data model in this scenario. The point is that you design the API around the task that it solves, rather than against whatever fixed spec OpenAI publishes. And then you tell ChatGPT, "here's an AI, make use of it for ..." - and it magically does, without you having to write any plumbing.
It sounds like you might have it backwards. The API spec is published by you and the AI consumes it.
It isn't yet. For example, Wolfram Alpha is an app that GPT is communicating to, and it actually exists.
Except you won't if you want to make money because then you don't have a business
Unless you charge for providing services of value to people.
And that is why some people think this AI leap could be as big as the internet.
Charge people for installing your plugin into ChatGPT.
I mean yeah, you'll have to provide a data model (and data) that other people don't have.
I mean, if you consider mobile we might already be down from the peak. In the sense that the interface bandwidth has shrunk to whatever 2 fingers can handle.
Headless app is the way to go.
This is what Apple's Siri was meant to be. Apple bought Siri from SRI international (Siri = SRI), and when it was launched was meant to include ability to book restaurants etc (thereby bypassing search), but somehow those capabilities were never released and today Siri still can't even control the iPhone!

My hot take on ChatGPT plugins is a bit mixed - should be very powerful, and maybe significant revenue generator, but at same time doesn't seem in the least bit responsible. We barely understand ChatGPT itself, and now it's suddenly being given ability to perform arbitrary actions!

Google's assistant, on the other hand, did figure out the reservation trick. Reportedly "book a table for four people at [restaurant name] tomorrow night" actually works, though I've never tried it.
Interesting - I wasn't aware of that. Will have to Google to see what else it may be capable of. Google really needs to update assistant with something LLM based though, and it seems Bard really isn't up to the job.
This doesn’t take a huge level of “AI” by any means. It’s really simple pattern matching in a very limited context.
Siri's capabilities are somehow much closer to Google Bard than ChatGPT (have tried all of them).
That's a bit harsh on Bard, but yes - just got access today and it's surprisingly weak.
BARD just gives up on coding questions.
All chatbots require AI to really be useful. This just did not exist until a few years ago.
This isn’t really true. Siri could easily be more useful in its current state if it had a larger library of intents and API access.
I'm kind of skeptical of this simply because people were saying the same thing about chatbots back when there was a lot of hype around Messenger. Sure, they weren't as advanced as what we have now, but they were fundamentally capable of the same things.

Not only did the hype not pan out, but it feels as if they were completely forgotten.

In a nutshell that's why I'm still largely dismissive of anything related to GPT. It's 2016-2018 all over again. Same tech demos. Same promises. Same hype. I honestly can't see the big fundamental breakthroughs or major shifts. I just see improvements, but not game-changing ones.

This is a healthy skepticism but the difference was that using Messenger chatbots was a disjointed, clunky experience that felt slower than just a few taps in the OpenTable app. Not to mention that their natural language understanding was only marginally better than Siri at best.

In this scenario, it seems dramatically faster to type or speak "Find me a dinner reservation for 4 tomorrow at a Thai or Vietnamese restaurant near me." than to browse Google Maps or OpenTable. It then comes down to the quality and personalization of the results, and ChatGPT has a leg up on Google here just due to the fact that their results are not filled with ads and garbage SEO bait.

>but they were fundamentally capable of the same things.

This is not the case. The difference between current state of the art NLP and chatbots 3 years ago is so massive, it has to be seen as qualitative. Pre GPT-3 computers did not understand language and no commerical chatbot had any AI. Now computers can understand language.

> Now computers can understand language.

"understand"

If I tell it to do X, and it does X, for all practical purposes it means that it understood what I said.
it was taught to react in specific way on specific word, the same thing you can train dog to bark on "quantum physics" phrase.
It can invent words, and then correctly use them to compose.

https://news.ycombinator.com/item?id=35268950

Well if it can bark out the right answer its an impressive damned dog...
At this point ChatGPT is blowing our human ability to use language so far out of the water on so many levels, I'd argue we should start putting quotes around our human ability to "understand" language. GPT-4 has already eclipsed us when it comes to language
This time it works.
Yeah being able to generate media/text is what excites me about these models, more than using my voice or a text input to do X instead of a webpage which has a GUI and buttons and text boxes.
I'm afraid that it has potential to subvert everything, looking at the plugins initiative is not hard to think like this: imagine the world where separate websites and just browsing websites as we know it doesn't exist, instead one is interacting with the model(s) directly to do what needs to be done - asking for news, buying new present for kids, discussing car models with selected price range etc.
It’s an interesting idea, but I’m not convinced the average person has any interest in texting or talking out loud to their device to complete all their computing tasks. It’s slower for most things.

Also I think there will be little interest in delegating that level of control to a single source for anything that’s important. For example, say I’ve got 5k to spend on home theatre gear, why on earth should I trust Shopify’s AI to suggest what I need and find the best price? The incentives aren’t in alignment.

That’s what went wrong with Alexa. They figured people would buy stuff via voice, but nobody trusted it for that.
As long as the services do get paid, this is not much different than what we have now

Google gatekeeps everything currently, it s in the browser, the search button, the phone etc. Having chatbots instead of google is better

> and a multi-modal response that will ultimately book my reservation?

How is it going to do that? OpenTable's value isn't in the tech, a 15 yo could implement that over the weekend. Or maybe chatGPT can be put in the restaurant, and somehow figure out how to seat you. And then you'd have a human talking to chatGPT and chatGPT talking to another chatGPT to complete the task. That'll be interesting, but otherwise this is overly complicated for all parties involved.

Anything preventing Bard/etc from using these plugins as well?

Would be nice to keep the ecosystem open.

There’s nothing stopping any LLM-backed chatbot from using plugins; the ReAct pattern discussed recently on HN is a general pattern for incorporating them.

The main limits are that unless they are integral and trained-in (which is less flexible), each takes space in the prompt, and in any case the interaction also takes token space, all of which reduces the token space available to the main conversation.

My experience with Bard is it probably isn't smart enough to figure out on its own how to use these. Google would probably have to do special finetuning/hardcoding for the plugins that they want to work.
Bard is a tard so I doubt it. Google is done.
I'm not sure if the word "subvert" is right; the OS is still there, the App Store is still there, and nothing they've demonstrated will measurably impact revenue from these sources (the iOS App Store's largest source of revenue, by far, is games. Some estimates put Games as like 25% of all of Apple's revenue).

I think there's also a global challenge (actually, opportunity IS the right word here) that by-and-large the makers of operating systems aren't the ones ahead in the language AI game right now. Bard/Google may have been close six months ago, but six months is an eternity in this space. Siri/Apple is so far behind that its not looking likely they can catch up. About a week ago a Windows 11 update was shipped which added a Bing AI button to the Windows 11 search bar; but Windows doesn't really drive the zeitgeist.

I wonder if 2023/4 is the year for Microsoft to jump back into the smartphone OS game. There may finally be something to the idea of a more minimalist, smaller voice-first smartphone that falls back on the web for application experiences, versus app-first.

Yes it will change the application layer. LLM allows using NUI as the universal interface to invoke under-utilized data & apis. We can now develop super-app rather than many one-off apps. I have been exploring this idea since 2021, love to connect with anyone who wants to work in this space.
Most (if not all) of those apps are free though, you supply them as a convenience because you know that smartphone owners spend money. The host OS loses access to that info, and that is used to target better ads in certain phone platforms.
Why do you think Apple would care? It came out in the Epic trial that 80%+ of App Store revenue comes from in app purchases in play to win games and buying loot boxes.

Apple doesn’t make any money from OpenTable.

I’m surprised Apple hasn’t improved siri with a model like this. Currently it’s just trash but with a GPT style model behind it you could actually get it to do things.
Why is it surprising? The amount of CPU resources server side to work on a billion iOS devices at any sort of performance level is extreme.

The limitations on making Siri more useful is just adding and refining its intent system. It already integrates with Wolfram Alpha for instance.

I agree, it's a revolutionary new better UX paradigm.
So, what's your prediction? Windows Phone has ChatGPT or the other phone os makers add Microsoft Chat App.