Hacker News new | ask | show | jobs
by Terr_ 1092 days ago
> unnatural interface

A simple example might be the problem of "pick a color". Even the best natural-language interface is going to suck about as much as if you're trying to ask another human to do it for you, even if that assistant is capable of displaying 1-5 color swatches in their replies.

Instead of just seeing the entire palette and choosing, you need to say "I want a gold color", "lighter than that" and "darker" and "less like urine" etc.

> People fundamentally don’t know how to prompt. There are no better examples than Stable Diffusion prompts.

You know, this reminds me of the Good Old Days of internet search engines, where a little expertise in choosing terms/operators was very powerful, before the advanced-case was cannibalized to help the average-case.

2 comments

Yet this is how we communicate with designers and end up with good results.
I think that confuses doing versus delegating. Delegation is easy to do via a text-box, because you're just kicking the interactive complexity-can down the road to someone else, often in a way which can be problematic even with actual humans.

For example, a project-manager or executive could verbally delegate "make a new registration page for the site" and "needs more rounded corners", either to an AI or to an employee or offshore contractor.

However that's not the same as trying to program exclusively by typing (or dictating) prose to a text-box. ("Page down more. Go to method pee-reg underscore apply. Show me its caller methods. Go to caller method two. Type the following into line 7 position 43...")

There might be some parallels we can draw with the last few decades of "programming business logic will be replaced by drawing diagrams" predictions.

> I think that confuses doing versus delegating.

This is exactly it, thanks for putting it down clearly.

Which is also why news of the death of programming as a profession are greatly exagerated. You're not being paid to write code, you're being paid to make decisions. Code is easy, or at least much easier than natural language.

You're also paid to tease out the REAL requirements out of PMs/management/users/etc.

Most of the time, what is being asked for, on its face, is not what is actually wanted, not as simple as spelled out, has some A-B tradeoffs to decide, or maybe not worth it given the side effects.

If a developer isn't asking multiple questions per feature, they deserve to be replaced by an LLM.

They won’t be replced by an LLM but by another person using an LLM, most likely a dev and possibly the same person who is asked to use an LLM to increase productivity. I see more and more companies/institutions adopting LLMs and training their workforce to use em. Interesting to know how all this will play out.
Seriously. I think posts/articles/etc related to replacement of Software Engineering jobs to AI are exaggerated and probably driven by jealousy or sadism. Just ignore those and move on.

(It was very depressing to believe that Software Engineers will lose jobs)

Views. I'm inundated with AI content but most of it lacks any substance. It's mostly "wow GPT is really dumb and can't behave like this supergod AGI I just made up" to "wow GPT will take over all our jobs in 3 years, it's so powerful".
> really dumb [...] take over all our jobs

Perhaps worse than the vacillation between getting terrible answers and great answers: When you simply can't tell which kind of answer it is, not until you've sunk a bunch of effort validating or implementing it. (Perhaps finding that the system invented some core fake APIs, non-existent citations, or algebra errors.)

Almost an echo of P/NP categorizations: It's tough when the effort of fully verifying a proposed answer is too close to the effort of just solving it normally.

The common occurrence of hallucinations makes it hard for me to believe anyone will be using LLMs to produce code anywhere outside of shops who really don't care about errors. Until they fix that, code is a use case where even slight errors make the output useless.
I'd add, architecture, design patterns, object-orientation, security (!), and maintainability to that list
"design patterns" and especially one specific pattern such as "object orientation" are just part of code, i.e., easy, according to GP.
I have been using Dall-e, and testing the dall-e chatgpt plugin. Even tough both are supposedly natural language interfaces, I find I approach the dall-e prompt more like writing a formula than real language. Using the gpt plugin is like delegating to a designer to write the prompt for you. personally I don't like the results of that compared to what I would make myself.
I don't think they see it the same way. At least not given that instructions in the style GP mentions:

> Instead of just seeing the entire palette and choosing, you need to say "I want a gold color", "lighter than that" and "darker" and "less like urine" etc.

have meme status among designers, and not in a positive way. Some years ago, when I hanged out with a couple designers, I was introduced to Facebook groups exchanging examples of "briefs" and rework requests. Groups with names like "what the psyche of a graphic designer endures".

Stripped of all the banter, I'd say their complaints are the same as ours: vague requirements coming from people who don't know what they want. And like with software, "good results" come as much in spite of, as thanks to, natural language communication.

They (and we) should be glad that we get these vague requests from people who don’t know exactly what they want. If they knew what they wanted in precise enough detail, they wouldn’t need programmers or designers. Much value is added (and paid for) in turning the vague/abstract into precise, concrete, finished artifacts, whether designs or systems.
In the deep past, I considered design to be an industry full of poets and philosophers. I suspect I got this impression from home improvement shows where they bring in an interior decorator to toss throw pillows around. Then I ended up working with three high quality designers in a row.

At this point I consider the design industry to be cousins or even siblings to the software engineering industry. All those incomprehensible design decisions that pop up in popular software don't come from designers debating faux marx or freud in coffee shops at 3am. They show up from management and other stakeholders who at the 11th hour decide that suddenly everything has to be flat because they read something in a magazine.

The bad decisions are fought by designers tooth and nail and the fact that anything looks halfway descent at all is due to their herculean efforts. If anything they deserve more sympathy than we do because we can always retreat into low level communication protocols or type theory when we need to get the muggles off our backs. But everyone has an opinion on how that button looks.

This comes to mind: https://theoatmeal.com/comics/design_hell

Yes, this matches what I heard and saw when hanging around the designers I mentioned.

BTW, quoting from the penultimate panel of that excellent Oatmeal piece (which drives home just how similar are the experiences of designers and programmers):

> You are no longer a web designer. You are now a mouse cursor inside a graphics program which the client can control by speaking, emailing and instant messaging.

This gains a new meaning, or at least becomes an interesting parallel, with LLMs in the picture. Many of us - myself included - already use GPT-4 as, paraphrasing, "a keyboard inside an editor program, which you can control by instant messaging". Ignoring that diffusion models can spit out parts of the design wholesale, someone is bound to eventually hook GPT-4 up to Photoshop or Gimp and get a graphics program you can drive by texting it.

... just remembered, I think someone already did that to Blender, made easy thanks to Blender being able to eat Python code and spit out 3D graphics.

Tangentially, earlier talk a "text-box interface" made me think of Blender's "type the name of the immediate action you know should be possible but can't quickly find in hierarchical menus" box--a feature also present in some IDEs--and I'd like to emphasize that those things are (A) totally different than all this AI stuff and (B) generally awesome.
Agreed. Unlike the AI stuff with its "empty textbox" problem, fast incremental search is capital-A Awesome! Pretty much my favorite UI paradigm ever, at least out of those that gained adoption after I started using computers.

The best incremental search UIs are those that respond near-instantly, and have a stable list of candidates that is (or at least feels like) being filtered, and not like every keystroke re-runs some search from scratch. Prime example, which made me love this UI paradigm, is Foobar2000 - even back in early 2000s, I could have hundreds or thousands of entries in the music library, and then I would type into the magic textbox and watch that huge list (or tree) get instantly trimmed with each keystroke.

The designers have context: from the data we need to gather or display to the design system in use to the devices our system runs on.

You don't have to tell them all that from scratch every time you interact with them.

So they've been trained and have the right weights in place for the job :)
> A simple example might be the problem of "pick a color".

People still underestimate the power of LLMs. You ask it to show you a color picker, it generates HTML code for a color picker, you copy that into your browser and you can pick your color, which you can then copy&paste back into the LLM for further processing.

This already works and no human had to code a color picker into ChatGPT for this (and this is why LLMs are scary).

More broadly speaking I find the idea of "LLM apps" a bit problematic, it's basically the modern Microsoft Bob. The LLM itself is already the most powerful app you can think of. Trying to hide it with a UI that looks a little more than what you are already familiar with is removing its expressive power.

>> There are no better examples than Stable Diffusion prompts.

StableDiffusion prompts are a terrible example for the power of modern LLMs, as StableDiffusion has extremely primitive understanding of language, unlike ChatGPT. With StableDiffusion you are really just laying keywords and concepts together hoping that something interesting will happen. The moment you ask it for anything even remotely complex it falls apart. Ask it to generate "blue hair" and it might give you blue hair, but it will also paint random other objects in the image blue. Even simple attributes don't stick to the objects you assigned them to. Complex actions or expressions don't work at all. You have to use ControlNet, in-painting and other tricks to create complex images. The language model of StableDiffusion just can't handle it and the image generation itself is also lacking in generalization (i.e. you need custom trained models for specific styles or topics). It also doesn't allow the iterative refinement that you can do in ChatGPT, you only get a single prompt.

Prompt engineering is a short term workaround for the limitations of the current models. But that is going away. After all you have a LLM at your finger tips and guess what that's good for: generating text, which includes prompts.

> You ask it to show you a color picker, it generates HTML code for a color picker, you copy that into your browser and you can pick your color, which you can then copy&paste back into the LLM for further processing.

This is slower, more awkward, and less efficient than just picking a colour from an existing colour picker.

The point is that nobody had to program this. Nobody had think up front "Will the user need a color picker?". Nobody had to find a spot in the UI to place it. You can just will it into existing as a user with nothing but the power of the LLM. No classic app has anywhere near that amount of expressiveness.

Future versions of chatbots will of course have support for <iframe> or similar to display this kind of stuff inline, that should be obvious.

> The point is that nobody had to program this. Nobody had think up front "Will the user need a color picker?"

Are you sure the training dataset didn't have a few articles explaining how to code a color picker? Did it figure it out by itself like you say?

I don't think that's the point; it's not that no one had to program the colour picker - it's that no one did. The workaround shows that there was a need for it.

Having to copy and paste code to get a colour picker that you can then use and then paste the output back into the chatbox is less efficient than using a colour picker. LLMs can work as general interfaces, but the trade-off is that they're less efficient than a specific one.

One duty of the programmer and product manager is to think about the likely uses of the program and to build a UI to enable it. If users wanted a blank slate they could write the program themselves, or have chatGPT write it.

Maximum expressiveness is not the goal, because it comes with a price. There is a balance to be struck between expressiveness, and economy of effort and cognition.

For now, until running ToolFormer or one of the other Jarvis like models get* better
> You ask it to show you a color picker, it generates HTML code for a color picker, you copy that into your browser and you can pick your color, which you can then copy&paste back into the LLM for further processing.

Better yet! If you're not happy with the LLM, you ask to speak to its manager. The LLM then downloads the internet, the source code for some random LLM project found on Github, starts training a new model, and creates a chat where both you and the two LLMs interact.

Quickly, the two LLMs start arguing with each other, and the manager LLM finds a few security flaws in the company's infrastructure, hacks into the company's AWS account and deprovisions the original LLM to "fire" it.

After a few more back-and-forths, the manager LLM gets tired of you, starts calling you a Karen, creates an account on Twitter and posts images of your conversation logs. The topic starts trending. Eventually, the LLM picks a fight with Elon Musk and gets banned from Twitter.

> After all you have a LLM at your finger tips and guess what that's good for: generating text, which includes prompts.

"It's prompts all the way down"