Hacker News new | ask | show | jobs
by bob1029 531 days ago
I think the goldilocks path is to make the user the agent and use the LLM simply as their UI/UX for working with the system. Human (domain expert) in the loop gives you a reasonable chance of recovering from hallucinations before they spiral entirely out of control.

"LLM as UI" seems to be something hanging pretty low on the tree of opportunity. Why spent months struggling with complex admin dashboard layouts and web frameworks when you could wire the underlying CRUD methods directly into LLM prompt callbacks? You could hypothetically make the LLM the exclusive interface for managing your next SaaS product. There are ways to make this just as robust and secure as an old school form punching application.

5 comments

It's quite tedious to have to write (or even say) full sentences to express intent. Imagine driving a car with a voice interface, including accelerator, brake, indicators and so on. Controls are less verbose and dashboards are more information rich than linear text.

It's difficult to be precise. Often it's easier to gauge things by looking at them while giving motor feedback (e.g. turning a dial, pushing a slider) than to say "a little more X" or "a bit less Y".

Language is poorly suited to expressing things in continuous domains, especially when you don't have relevant numbers that you can pick out of your head - size, weight, color etc. Quality-price ratio is a particularly tough one - a hard numeric quantity traded off against something subjective.

Most people can't specify up front what they want. They don't know what they want until they know what's possible, what other people have done, started to realize what getting what they want will entail, and then changed what they want. It's why we have iterative development instead of waterfall.

LLMs are a good start and a tool we can integrate into systems. They're a long, long way short of what we need.

re: LLM as UI: Given that I don't trust LLMs to be deterministic, I wouldn't trust them to make the correct API call every time I tell it to do X.
I think most users have a fixed set of workflows which usually don't change from day to day, so why not just use LLMs as a macro builder with a natural language interface (and which doesn't require you to know the product's UI well beforehand):

- you ask LLM to build a workflow for your problem

- the LLM builds the workflow (macro) using predefined commands

- you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

- you save the workflow and can use it without any LLM agents, just clicking a button - pretty determenistic and reliable

Advantages:

- reliable, deterministic

- you don't need to learn a product's UI, you just formulate your problem using natural language

> you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

This is the idea that is most valuable from my perspective of having tried to extract accurate requirements from the customer. Getting them to learn your product UI and capabilities is an uphill battle if you are in one of the cursed boring domains (banking, insurance, healthcare, etc.).

Even if the customer doesn't get the LLM-defined path to provide their desired final result, you still have their entire conversation history available to review. This seems more likely to succeed in practice than hoping the customer provides accurate requirements up-front in some unconstrained email context.

>- you review the workflow (can be an intuitive list of commands, understandable by non-specialist)

so you define a DSL that the LLM outputs, and that's the real UI

>- you don't need to learn a product's UI, you just formulate your problem using natural language

yes, you do. You have to learn the DSL you just manifested so that you can check it for errors. Once you have the ability to review the LLM's output, you will also have the ability to just write the DSL to get the desired behavior, at which point that will be faster unless it's a significant amount of typing, and even then, you will still need to review the code generated by the LLM, which means you have to learn and understand the DSL. I would much rather learn a GUI than a DSL.

You haven't removed the UI, nor have you made the LLM the UI, in this example. The DSL ("intuitive list of commands.. I guess it'll look like the Robot Framework right? that's what human-readable DSLs tend to look like in practice) is the actual UI.

This is vastly more complicated than having a GUI to perform an action.

I never said the user must be exposed to a DSL, I think you're overcomplicating it for the sake of overcomplicating. DSL can be used under the hood by the execution engine, but the user can be exposed to a simpler variant of it, either by clever hardcoded postprocessing of known commands when rendering the final result for human review, or maybe use the LLM itself to summarize the planned actions (although it can hallucinate while summarizing, but the chance is miniscule, especially if a user can test a saved workflow). My point was mostly about two things:

1) "it's unpredictable each time" - it won't be, if a workflow is saved and tested, because when it's run, no LLM is involved anymore in decision making

2) I did remove the UI, because I don't need to learn the UI, I just formulate my problem and the LLM constructs a possible workflow which solves my problem out of predefined commands known to the system.

Sure this is most useful for more complex apps. In our homegrown CRM/ERP, users have lots of different workflows depending on their department, and they often experiment with workflows, and today they either have to click through everything manually (wasting time) or ask devs to implement the needed workflow for them (wasting time). If your app has 3 commands on 1 page then sure, it's easier to do it using GUI.

Also IMHO it can be used alongside with GUI, it doesn't need to replace it, I think it's great for discoverability/onboarding and automation, but if you want to click through everything manually, why not.

The bit you are missing is that "known to the system" is not enough, as the consumer I need to _verify the logic_, which means that at some level, I do have to read the DSL (just as I have to read the Java, not, in general, the actual assembly emitted by the JIT). Which means that the DSL is actually the product here (though the LLM may make it easier to learn that DSL and in some cases to write something in it).
1) You don't need to read the DSL in the raw form if you use a language model to convert it to a few paragraphs in natural language.

2) You can test the created workflow on a bunch of test data to verify it works as intended. After a workflow is created, it's deterministic (since we don't use LLMs anymore for decision making), so it will always work the same.

Sure we can expose DSL to power users as an option, but is reading the raw DSL really required for the majority of cases?

This is the same approach we took when we added LLM capability to a low code tool Appian. LLM helped us generate the Appian workflow configuration file, user reviews it and make changes if required, and then finally publishes it.
So visual programming x.0?

I am pretty sure PLCs with ladder logic are about the limits of the traditional visual/macro model?

Word-sense disambiguation is going to be problematic with the 'don't need to learn' part above.

Consider this sentence:

'I never said she stole my money'

Now read that sentence multiple times, puting emphasis on each word, one at a time and notice how the symantic meaning changes.

LLMs are great at NLP, but we still don't have solutions to those NLU problems that I am aware of.

I think to keep maximum generality without severely restricted use cases that a common DSL would need to be developed.

There will have to be tradeoffs made, specific to particular use cases, even if it is better than Alexa.

But I am thinking about Rice's theorm and what happens when you lose PEM.

Maybe I just am too embedded in an area where these problems are a large part of the difficulty for macro style logic to provide much use.

You're just describing programming with the extra step of going through a high entropy and low bandwidth channel of natural language and hand waving that problem away.

We can "just" write code as well, as we have been doing for several decades.

I dont either, but this can be mitigated by adding guard rails (strictly validating input), double checking actions with the user and using it for tasks where a mistake isnt world ending.

Even then mistakes can slip through, but it could still be more reliable than a visual UI.

There are lots of horrible web UIs i would LOVE to replace with a conversational LLM agent. No #1 is jira and so is no #2 and #3.

They are deterministic at 0 temperature
At zero temp there is still non-determism due to sampling and the fact that floating point addition is not commutative so you will get varying results due to parallelism.
(Disclaimer: I know literally nothing about LLMs.) Wouldn't there still be issues of sensitivity, though? Like, wouldn't you still have to ensure that the wording of your commands stays exactly the same every time? And with models that take less discrete data (e.g. ChatGPT's new "advanced voice model" that works on audio directly), this seems even harder.
s/advanced voice model/advanced voice mode/ (too late for me to edit my original comment)
They are pretty deterministic then but they are also pretty useless at 0 temperature.
Not for the leading LLMs from OpenAI and Anthropic.
Not really, not in practice. The order of execution is non-deterministic when running on a cluster or a gpu, or more than one core of the CPU and rounding errors propagate differently on each run.
I had the same epiphany about LLM as UI trying to build a front end for a image enhancer workflow I built with Stable Diffusion. I just about fully built out a Chrome extension and then realized I should just build a 'tool' that llama can interact with and use open webui as the front end.

quick demo: https://youtu.be/2zvbvoRCmrE

> I think the goldilocks path is to make the user the agent and use the LLM simply as their UI/UX for working with the system

That's a funny definition to me, because doing so would mean the LLM is the agent, if you use the classic definition for "user-agent" (as in what browsers are). You're basically inverting that meaning :)

> "LLM as UI" seems to be something hanging pretty low on the tree of opportunity.

Yes if you want to annoy your users and deliberately put roadblocks to make progress on a task. Exhibit A: customer support. They put the LLM in between to waste your time. It’s not even a secret.

> Why spent months struggling with complex admin dashboard layouts

You can throw something together, and even auto generate forms based on an API spec. People don’t do this too often because the UX is insufficient even for many internal/domain expert support applications. But you could and it would be deterministic, unlike an LLM. If the API surface is simple, you can make it manually with html & css quickly.

Overuse of web frameworks has completely different causes than ”I need a functional thing” and thus it cannot be solved with a different layer of tech like LLMs, NFTs or big data.

> Yes if you want to annoy your users and deliberately put roadblocks to make progress on a task. Exhibit A: customer support. They put the LLM in between to waste your time. It’s not even a secret.

No this is because they use the LLM not only as human interface but also as a reasoning engine for troubleshooting. And give it way less capability than a human agent to boot. So all it can really do is serve FAQs and route to real support.

In this case the fault is not with the LLM but with the people that put it there.