Hacker News new | ask | show | jobs
by grumbel 1089 days ago
> A simple example might be the problem of "pick a color".

People still underestimate the power of LLMs. You ask it to show you a color picker, it generates HTML code for a color picker, you copy that into your browser and you can pick your color, which you can then copy&paste back into the LLM for further processing.

This already works and no human had to code a color picker into ChatGPT for this (and this is why LLMs are scary).

More broadly speaking I find the idea of "LLM apps" a bit problematic, it's basically the modern Microsoft Bob. The LLM itself is already the most powerful app you can think of. Trying to hide it with a UI that looks a little more than what you are already familiar with is removing its expressive power.

>> There are no better examples than Stable Diffusion prompts.

StableDiffusion prompts are a terrible example for the power of modern LLMs, as StableDiffusion has extremely primitive understanding of language, unlike ChatGPT. With StableDiffusion you are really just laying keywords and concepts together hoping that something interesting will happen. The moment you ask it for anything even remotely complex it falls apart. Ask it to generate "blue hair" and it might give you blue hair, but it will also paint random other objects in the image blue. Even simple attributes don't stick to the objects you assigned them to. Complex actions or expressions don't work at all. You have to use ControlNet, in-painting and other tricks to create complex images. The language model of StableDiffusion just can't handle it and the image generation itself is also lacking in generalization (i.e. you need custom trained models for specific styles or topics). It also doesn't allow the iterative refinement that you can do in ChatGPT, you only get a single prompt.

Prompt engineering is a short term workaround for the limitations of the current models. But that is going away. After all you have a LLM at your finger tips and guess what that's good for: generating text, which includes prompts.

2 comments

> You ask it to show you a color picker, it generates HTML code for a color picker, you copy that into your browser and you can pick your color, which you can then copy&paste back into the LLM for further processing.

This is slower, more awkward, and less efficient than just picking a colour from an existing colour picker.

The point is that nobody had to program this. Nobody had think up front "Will the user need a color picker?". Nobody had to find a spot in the UI to place it. You can just will it into existing as a user with nothing but the power of the LLM. No classic app has anywhere near that amount of expressiveness.

Future versions of chatbots will of course have support for <iframe> or similar to display this kind of stuff inline, that should be obvious.

> The point is that nobody had to program this. Nobody had think up front "Will the user need a color picker?"

Are you sure the training dataset didn't have a few articles explaining how to code a color picker? Did it figure it out by itself like you say?

I don't think that's the point; it's not that no one had to program the colour picker - it's that no one did. The workaround shows that there was a need for it.

Having to copy and paste code to get a colour picker that you can then use and then paste the output back into the chatbox is less efficient than using a colour picker. LLMs can work as general interfaces, but the trade-off is that they're less efficient than a specific one.

One duty of the programmer and product manager is to think about the likely uses of the program and to build a UI to enable it. If users wanted a blank slate they could write the program themselves, or have chatGPT write it.

Maximum expressiveness is not the goal, because it comes with a price. There is a balance to be struck between expressiveness, and economy of effort and cognition.

For now, until running ToolFormer or one of the other Jarvis like models get* better
> You ask it to show you a color picker, it generates HTML code for a color picker, you copy that into your browser and you can pick your color, which you can then copy&paste back into the LLM for further processing.

Better yet! If you're not happy with the LLM, you ask to speak to its manager. The LLM then downloads the internet, the source code for some random LLM project found on Github, starts training a new model, and creates a chat where both you and the two LLMs interact.

Quickly, the two LLMs start arguing with each other, and the manager LLM finds a few security flaws in the company's infrastructure, hacks into the company's AWS account and deprovisions the original LLM to "fire" it.

After a few more back-and-forths, the manager LLM gets tired of you, starts calling you a Karen, creates an account on Twitter and posts images of your conversation logs. The topic starts trending. Eventually, the LLM picks a fight with Elon Musk and gets banned from Twitter.

> After all you have a LLM at your finger tips and guess what that's good for: generating text, which includes prompts.

"It's prompts all the way down"