Hacker News new | ask | show | jobs
by nemomarx 364 days ago
This feels like a pretty big ergonomics gap in presenting things as a chat window at all?
3 comments

I worked on a very early iteration of LMs (they weren't "large" yet) in grad school 20 years ago and we drove it with a Makefile. The "prompt" was an input file and it would produce a response as an artifact. It never even occurred to us to structure it as a sequential "chat" because at that point it was still too slow. But it does make me wonder how much the UX changes the way people think about it.
It's more compelling to fundraising and hype-pushing stories to make it look as "person-like" as possible.
Or people like the familiar chat interface and they don’t want to dick around with a complicated workflow like the person above provided.

What are examples of 3rd party UIs that make these alternative, superior workflows easier?

There is the "classic" text completion interface that OpenAI used before ChatGPT. Basically a text document that you ask the LLM to extend (or insert text at a marker somewhere in the text). Any difference between your text and the AI's text is only visible in text color in the editor and not passed on to the LLM.

That does favor GP's workflow: You start the document with a description of your problem and end with a sentence like: "The following is a proposed solution". Then you let the LLM generate text, which should be a solution. You edit that to your taste, then add the sentence: "These are the 10 biggest flaws with this plan:" and hit generate. The LLM doesn't know that it came up with the idea itself, so it isn't biased towards it.

Of course this style is much less popular with users and much harder to do things like instruction tuning. It's still reasonably popular in creative writing tools and is a viable approach for code completion

ChatGPT is how old again? People are FAR more familiar with other interfaces. For coding, autocomplete is a great already-existing interface; products that use it don't get as much hype, though, as the ones that claim to be independent agents that you're talking to. There's any number of common interfaces attached to that (like the "simplify this" right-click for Copilot) for refactoring, dealing with builds, tests, etc. No shortage of places you could further drop in an LLM instead of pushing things primarily through "chat with me" to type out "refactor this to make these changes".

Or you could make the person's provided workflow not just more automatic but more integrated: generate the output, have labels with hover text or inline overlays or such along "this does this" or "here are alternative ways to do this" or "this might be an issue with this approach." All could be done much better in a rich graphical user interface than slamming it into a chat log. (This is one of Cursor's biggest edges over ChatGPT - the interactive change highlighting and approval in my tool in my repo, vs a chat interface.)

In some other fields:

* email summarization is automatic or available at the press of a button, nobody expects you to open up a chat agent and go "please summarize this email" after opening a message in Gmail

* photo editors let you use the mouse to select an area and then click a button labeled "remove object" or such instead of requiring you to try to describe the edit in a chat box. sometimes they mix and match it too - highlight the area THEN describe a change. But that's approximately a million times better than trying to chat to it to describe the area precisely.

There are other scenarios we haven't figured out the best interface for because they're newer workflows. But the chat interface is just so unimaginative. For instance, I spent a long time trying to craft the right prompt to tweak the output of ChatGPT turning a picture of my cat into a human. I couldn't find the right words to get it to understand and execute what I didn't like about the image. I'm not UX inventor, but one simple thing that would've helped would've been an eye-doctor like "here's two options, click the one you like more." (Photoshop has something like this, but it's not so directed, it's more just "choose one of these, or re-roll" but at least it avoids polluting the chat context history as much). Or let me select particular elements and change or refine them individually.

A more structured interface should actually greatly help the model, too. Instead of having just a linear chat history to digest, it would have well-tagged and categorized feedback that it could keep fresh and re-insert into its prompts behind the scenes continually. (You could also try to do this based on the textual feedback, but like I said, it seemed to not be understanding what my words were trying to get at. Giving words as feedback on a picture just seems fundamentally high-loss.)

I find it hard to believe that there is any single field where a chat interface is going to be the gold standard. But: they're relatively easy to make and they let you present your model as a persona. Hard combo to overcome, though we're seeing some good signs!

This. I think it's the key.