Hacker News new | ask | show | jobs
by Darthy 891 days ago
In the linked keynote, Jesse Lyu mentions that LLM won't help us actually do tasks - there are currently no so-called "agents" that do something simple like book a flight - the best way to do it is still to click the buttons yourself.

Rabbit means to solve that by creating a "LAM", a "Large Action model", which is a service by Rabbit that will click interfaces for you. I'm not sure this is the right approach - if it is successful, it will lead to more centralisation around Rabbit.

I agree this is a problem, but I feel a better approach would be to have a market of agents that for a small fee actually handle the whole transaction for you. So there might be multiple parties that say they can buy Delta Flight DL101 tomorrow 21:10 for various prices - some might be a service like the Rabbit LAM, others might be booking platforms, and there might even be airlines themselves. And now an agent-concierge that you choose once at the start will look at all the parties, and then pick and buy the right flight for you. This will make the problem a problem of an open market, where good speedy service is promoted, and prices get ever lower. And if the Rabbit LAM gets outcompeted by an ever better speedier solution, that would be a good thing. (This will also allow us to move away from our current dreaded attention-based economy where e.g. a booking websites tries to exploit your required presence during waiting times, which the LAMs would also solve, but, like I said, let's not move towards more centralisation.)

4 comments

> Rabbit means to solve that by creating a "LAM", a "Large Action model", which is a service by Rabbit that will click interfaces for you. I'm not sure this is the right approach - if it is successful, it will lead to more centralisation around Rabbit.

The LAM is a genius hack to get around the thousands of closed gardens that apps have created.

It also may have been easier than teaching an LLM how to make tons of API calls, and if done right I presume their LAM adapts to UI changes, vs writing integrations against breaking / deprecating APIs.

You’re much more impressed than I am.

90% of use cases will be covered by an official API.

They’ll cover the other 10% with “teaching”. Essentially you telling the AI what the lazily written markup actually means. Then they save it into an automation template. QA teams have only been doing that for the better part of 3 decades.

I know a company that employs a building of a 1,000 people doing nothing but performing 1 click. So they put a human in the scraping /automation loop so they don’t violate the site/services TOS.

Good luck with that.

Uber wants people in its app, they want to show ads for their subscription membership services, and they want to upsell you on services, and they want you to see sponsored restaurants first when you order food. Uber wants to own the relationship with customers, so they can ~exploit the customers more~ extract more value.

VC backed and publicly listed companies need endless growth, user-centric systems like what Rabbit is offering break those business models apart. Which is why I predict everyone is going to be fighting super hard against making UIs that just get shit done.

Agree with everything you're saying.

Watching the keynote, I found myself thinking how unhappy Uber would be with skipping over interacting with them entirely: there's no "Uber experience" you have when you're in the car, so what do you get from Uber that any random company with a tie in to Rabbit can't get you?

Option 1: a shift in devices/model like Rabbit pull the magic carpet out from under companies like Uber, and everything becomes purely transactional.

Option 2: rabbit-like market creates exclusivity-based need, to ensure Uber is the number-one (or only) rideshare choice, so it doesn't matter that customers aren't "experiencing" Uber. Uber relinquishes the experience to the agent (unlikely).

Option 3: Uber et al wage war against agents and make their use impossible

But if we're not careful this will circle back to apps/silos.

What I'd like to see is the Smalltalk approach: data providers that are able to send/receive messages, and can be connected together to achieve a goal. Even better if the connecting is done by the "machine" after I issue a command.

actually agent frameworks are becoming very popular now

https://github.com/joaomdmoura/crewAI

its been such a long year, I still remember the month of gpt...what was it, not gpt4all...gpt...ah whatever. The "running an LLM in a loop will solve it" approach. I'm not a big fan, I'd need to see something truly transformative.

This seems to be a Langchain wrapper, where the Langchain is a prompt + retrieval based on a few documents.

ex. `https://github.com/joaomdmoura/crewAI-examples/tree/main/sto...` ``` BrowserTools.scrape_and_summarize_website, SearchTools.search_internet, CalculatorTools.calculate, SECTools.search_10q, SECTools.search_10k ```

> Rabbit means to solve that by creating a "LAM", a "Large Action model", which is a service by Rabbit that will click interfaces for you.

https://openadapt.ai is an open source app that runs on your local machine that clicks interfaces for you -— but only for repetitive tasks that you show it how to do.

QA teams have been doing this sort of stuff for decades. With a little know how and an hour you could record a user doing something in the DOM and play it back. There’s no magic here.