Hacker News new | ask | show | jobs
by GiorgioG 521 days ago
re: LLM as UI: Given that I don't trust LLMs to be deterministic, I wouldn't trust them to make the correct API call every time I tell it to do X.
3 comments

I think most users have a fixed set of workflows which usually don't change from day to day, so why not just use LLMs as a macro builder with a natural language interface (and which doesn't require you to know the product's UI well beforehand):

- you ask LLM to build a workflow for your problem

- the LLM builds the workflow (macro) using predefined commands

- you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

- you save the workflow and can use it without any LLM agents, just clicking a button - pretty determenistic and reliable

Advantages:

- reliable, deterministic

- you don't need to learn a product's UI, you just formulate your problem using natural language

> you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

This is the idea that is most valuable from my perspective of having tried to extract accurate requirements from the customer. Getting them to learn your product UI and capabilities is an uphill battle if you are in one of the cursed boring domains (banking, insurance, healthcare, etc.).

Even if the customer doesn't get the LLM-defined path to provide their desired final result, you still have their entire conversation history available to review. This seems more likely to succeed in practice than hoping the customer provides accurate requirements up-front in some unconstrained email context.

>- you review the workflow (can be an intuitive list of commands, understandable by non-specialist)

so you define a DSL that the LLM outputs, and that's the real UI

>- you don't need to learn a product's UI, you just formulate your problem using natural language

yes, you do. You have to learn the DSL you just manifested so that you can check it for errors. Once you have the ability to review the LLM's output, you will also have the ability to just write the DSL to get the desired behavior, at which point that will be faster unless it's a significant amount of typing, and even then, you will still need to review the code generated by the LLM, which means you have to learn and understand the DSL. I would much rather learn a GUI than a DSL.

You haven't removed the UI, nor have you made the LLM the UI, in this example. The DSL ("intuitive list of commands.. I guess it'll look like the Robot Framework right? that's what human-readable DSLs tend to look like in practice) is the actual UI.

This is vastly more complicated than having a GUI to perform an action.

I never said the user must be exposed to a DSL, I think you're overcomplicating it for the sake of overcomplicating. DSL can be used under the hood by the execution engine, but the user can be exposed to a simpler variant of it, either by clever hardcoded postprocessing of known commands when rendering the final result for human review, or maybe use the LLM itself to summarize the planned actions (although it can hallucinate while summarizing, but the chance is miniscule, especially if a user can test a saved workflow). My point was mostly about two things:

1) "it's unpredictable each time" - it won't be, if a workflow is saved and tested, because when it's run, no LLM is involved anymore in decision making

2) I did remove the UI, because I don't need to learn the UI, I just formulate my problem and the LLM constructs a possible workflow which solves my problem out of predefined commands known to the system.

Sure this is most useful for more complex apps. In our homegrown CRM/ERP, users have lots of different workflows depending on their department, and they often experiment with workflows, and today they either have to click through everything manually (wasting time) or ask devs to implement the needed workflow for them (wasting time). If your app has 3 commands on 1 page then sure, it's easier to do it using GUI.

Also IMHO it can be used alongside with GUI, it doesn't need to replace it, I think it's great for discoverability/onboarding and automation, but if you want to click through everything manually, why not.

The bit you are missing is that "known to the system" is not enough, as the consumer I need to _verify the logic_, which means that at some level, I do have to read the DSL (just as I have to read the Java, not, in general, the actual assembly emitted by the JIT). Which means that the DSL is actually the product here (though the LLM may make it easier to learn that DSL and in some cases to write something in it).
1) You don't need to read the DSL in the raw form if you use a language model to convert it to a few paragraphs in natural language.

2) You can test the created workflow on a bunch of test data to verify it works as intended. After a workflow is created, it's deterministic (since we don't use LLMs anymore for decision making), so it will always work the same.

Sure we can expose DSL to power users as an option, but is reading the raw DSL really required for the majority of cases?

1. Now you have two problems (did the writer translate what I said correctly and did the summarizer translate what the writer wrote correctly).

2. This is absolutely true and it does help somewhat. However, writing the test cases is now your bottleneck (and you're writing them as a substitute for being able to read a reliable high-level summary of what the workflow actually is).

NAtural language isn't precise enough to describe exactly what's happening. If you do try to use natural language for that purpose, trying to eliminate ambiguity, you end up with legalese. And people can't read legalese, even though it's technically "plain english"
This is the same approach we took when we added LLM capability to a low code tool Appian. LLM helped us generate the Appian workflow configuration file, user reviews it and make changes if required, and then finally publishes it.
So visual programming x.0?

I am pretty sure PLCs with ladder logic are about the limits of the traditional visual/macro model?

Word-sense disambiguation is going to be problematic with the 'don't need to learn' part above.

Consider this sentence:

'I never said she stole my money'

Now read that sentence multiple times, puting emphasis on each word, one at a time and notice how the symantic meaning changes.

LLMs are great at NLP, but we still don't have solutions to those NLU problems that I am aware of.

I think to keep maximum generality without severely restricted use cases that a common DSL would need to be developed.

There will have to be tradeoffs made, specific to particular use cases, even if it is better than Alexa.

But I am thinking about Rice's theorm and what happens when you lose PEM.

Maybe I just am too embedded in an area where these problems are a large part of the difficulty for macro style logic to provide much use.

You're just describing programming with the extra step of going through a high entropy and low bandwidth channel of natural language and hand waving that problem away.

We can "just" write code as well, as we have been doing for several decades.

I dont either, but this can be mitigated by adding guard rails (strictly validating input), double checking actions with the user and using it for tasks where a mistake isnt world ending.

Even then mistakes can slip through, but it could still be more reliable than a visual UI.

There are lots of horrible web UIs i would LOVE to replace with a conversational LLM agent. No #1 is jira and so is no #2 and #3.

They are deterministic at 0 temperature
At zero temp there is still non-determism due to sampling and the fact that floating point addition is not commutative so you will get varying results due to parallelism.
(Disclaimer: I know literally nothing about LLMs.) Wouldn't there still be issues of sensitivity, though? Like, wouldn't you still have to ensure that the wording of your commands stays exactly the same every time? And with models that take less discrete data (e.g. ChatGPT's new "advanced voice model" that works on audio directly), this seems even harder.
s/advanced voice model/advanced voice mode/ (too late for me to edit my original comment)
They are pretty deterministic then but they are also pretty useless at 0 temperature.
Not for the leading LLMs from OpenAI and Anthropic.
Not really, not in practice. The order of execution is non-deterministic when running on a cluster or a gpu, or more than one core of the CPU and rounding errors propagate differently on each run.