Hacker News new | ask | show | jobs
by kgeist 521 days ago
I think most users have a fixed set of workflows which usually don't change from day to day, so why not just use LLMs as a macro builder with a natural language interface (and which doesn't require you to know the product's UI well beforehand):

- you ask LLM to build a workflow for your problem

- the LLM builds the workflow (macro) using predefined commands

- you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

- you save the workflow and can use it without any LLM agents, just clicking a button - pretty determenistic and reliable

Advantages:

- reliable, deterministic

- you don't need to learn a product's UI, you just formulate your problem using natural language

5 comments

> you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

This is the idea that is most valuable from my perspective of having tried to extract accurate requirements from the customer. Getting them to learn your product UI and capabilities is an uphill battle if you are in one of the cursed boring domains (banking, insurance, healthcare, etc.).

Even if the customer doesn't get the LLM-defined path to provide their desired final result, you still have their entire conversation history available to review. This seems more likely to succeed in practice than hoping the customer provides accurate requirements up-front in some unconstrained email context.

>- you review the workflow (can be an intuitive list of commands, understandable by non-specialist)

so you define a DSL that the LLM outputs, and that's the real UI

>- you don't need to learn a product's UI, you just formulate your problem using natural language

yes, you do. You have to learn the DSL you just manifested so that you can check it for errors. Once you have the ability to review the LLM's output, you will also have the ability to just write the DSL to get the desired behavior, at which point that will be faster unless it's a significant amount of typing, and even then, you will still need to review the code generated by the LLM, which means you have to learn and understand the DSL. I would much rather learn a GUI than a DSL.

You haven't removed the UI, nor have you made the LLM the UI, in this example. The DSL ("intuitive list of commands.. I guess it'll look like the Robot Framework right? that's what human-readable DSLs tend to look like in practice) is the actual UI.

This is vastly more complicated than having a GUI to perform an action.

I never said the user must be exposed to a DSL, I think you're overcomplicating it for the sake of overcomplicating. DSL can be used under the hood by the execution engine, but the user can be exposed to a simpler variant of it, either by clever hardcoded postprocessing of known commands when rendering the final result for human review, or maybe use the LLM itself to summarize the planned actions (although it can hallucinate while summarizing, but the chance is miniscule, especially if a user can test a saved workflow). My point was mostly about two things:

1) "it's unpredictable each time" - it won't be, if a workflow is saved and tested, because when it's run, no LLM is involved anymore in decision making

2) I did remove the UI, because I don't need to learn the UI, I just formulate my problem and the LLM constructs a possible workflow which solves my problem out of predefined commands known to the system.

Sure this is most useful for more complex apps. In our homegrown CRM/ERP, users have lots of different workflows depending on their department, and they often experiment with workflows, and today they either have to click through everything manually (wasting time) or ask devs to implement the needed workflow for them (wasting time). If your app has 3 commands on 1 page then sure, it's easier to do it using GUI.

Also IMHO it can be used alongside with GUI, it doesn't need to replace it, I think it's great for discoverability/onboarding and automation, but if you want to click through everything manually, why not.

The bit you are missing is that "known to the system" is not enough, as the consumer I need to _verify the logic_, which means that at some level, I do have to read the DSL (just as I have to read the Java, not, in general, the actual assembly emitted by the JIT). Which means that the DSL is actually the product here (though the LLM may make it easier to learn that DSL and in some cases to write something in it).
1) You don't need to read the DSL in the raw form if you use a language model to convert it to a few paragraphs in natural language.

2) You can test the created workflow on a bunch of test data to verify it works as intended. After a workflow is created, it's deterministic (since we don't use LLMs anymore for decision making), so it will always work the same.

Sure we can expose DSL to power users as an option, but is reading the raw DSL really required for the majority of cases?

1. Now you have two problems (did the writer translate what I said correctly and did the summarizer translate what the writer wrote correctly).

2. This is absolutely true and it does help somewhat. However, writing the test cases is now your bottleneck (and you're writing them as a substitute for being able to read a reliable high-level summary of what the workflow actually is).

NAtural language isn't precise enough to describe exactly what's happening. If you do try to use natural language for that purpose, trying to eliminate ambiguity, you end up with legalese. And people can't read legalese, even though it's technically "plain english"
This is the same approach we took when we added LLM capability to a low code tool Appian. LLM helped us generate the Appian workflow configuration file, user reviews it and make changes if required, and then finally publishes it.
So visual programming x.0?

I am pretty sure PLCs with ladder logic are about the limits of the traditional visual/macro model?

Word-sense disambiguation is going to be problematic with the 'don't need to learn' part above.

Consider this sentence:

'I never said she stole my money'

Now read that sentence multiple times, puting emphasis on each word, one at a time and notice how the symantic meaning changes.

LLMs are great at NLP, but we still don't have solutions to those NLU problems that I am aware of.

I think to keep maximum generality without severely restricted use cases that a common DSL would need to be developed.

There will have to be tradeoffs made, specific to particular use cases, even if it is better than Alexa.

But I am thinking about Rice's theorm and what happens when you lose PEM.

Maybe I just am too embedded in an area where these problems are a large part of the difficulty for macro style logic to provide much use.

You're just describing programming with the extra step of going through a high entropy and low bandwidth channel of natural language and hand waving that problem away.

We can "just" write code as well, as we have been doing for several decades.