Hacker News new | ask | show | jobs
by alexawarrior3 856 days ago
None of these I've seen actually works in practice. Having used LLMs for software development the past year or so, even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work. I've got to go back and forth with the LLM to get anything useful and even then have to substantially modify it. I really hope there are some big advancements soon and this doesn't just collapse into another AI winter, but I can easily see this happening.

Some recent actual uses cases for me where an agent would NOT be able to help me although I really wish it would:

1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.

2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.

3. Illustrator Agent - Image generation for above story. Images end up very "LLM" looking, often miss key elements in the story, but one thing is worst of all: no persistent characters. This is already a big problem with text, but an even bigger problems with images. Every image for the same story has a character who looks different, but I want them to be the same.

4. Publisher Agent - Package things together above so I can get a complete package of illustrated stories on topics available on web/mobile for viewing, tracking progress, at varying levels.

Just some examples of where LLMs are currently not moving the needle much if at all.

6 comments

>even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work

This is certainly true for more complex code generation. But there are a lot of "rote" work that I do use GPT to generate, and I feel like those have really improved my productivity.

The other use case for AI-assisted coding is that it _really_ helps me learn certain stuff. Whether it's a new language, or code that someone else wrote. Often times I know what I want done, but I don't know the corresponding utility functions in that language, and AI will not only be able to generate it for me but also through the process teach me about the existence of those things.(some of which are wrong lol, but it's correct enough for me to keep that behavior)

> 2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.

You have to break it down into smaller steps and provide way more detail than you think you do in the context. I did an experiment in story generation where I had "authors" that would write only from the perspective of one of the characters that was also completely generated starting first from genre, name, character traits, etc. Then for a given scene, within a given plot and where in the story you are, randomly rotate between authors for each generation, appending it in memory, but not all of the story fits in context. And each generation is only a couple hundred tokens where you ask it to start/continue/end the story. The context contains all of this information in a simple key:value format. And essentially treat the LLM like a loom and spin the story out.

Usually what it produces isn't quite the best, but that's okay, because you can further refine the generation by using different system/user prompts explicitly for editing the content. I found that asking it to suggest one refinement and phrase it as a direct command, then feeding that command with the original generation, works. This meta-prompting tends to produce changes that subjectively improve the text according to whatever dimensions specified in the system prompt.

If you treat the composition as way more mechanical with tightly constrained generation, you get a much better, much more controlled result.

> 1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.

That’s because none of the models have been trained on this. Create a dataset for this and train a model to do it and it will be able to do it.

https://www.youtube.com/watch?v=bRFLE9qi3t8

Here's the CEO of Builder.io supporting your comment: he says they tried LLMs/agents, and it didn't work. Then, they collected a dataset and developed an in-house model only to assist where they couldn't solve with imperative programming

Not really, he's saying that the solution is to not have the entire process in a single model, it's better to have the model work on specific pieces that you broke down, rather than feeding the whole thing and expecting the model to be able to break it down and generate correctly by itself.
One area that has been useful for me, is writing simple code in languages I am not familiar with, and not willing to learn. For example, I needed to write a small bash script to automate things in Ubuntu, it really saved me time on googling all those commands. Same with Task Scheduler XML language. It knows very well the popular use cases of all the languages.
Besides writing boilerplate, I used AI to generate a color scheme and imagery for a charity website I built.
Why do you want it to generate web pages from images? I'm having trouble understanding the workflow here. You see a component you like on another website and want to obtain the code from it? Or if you have a design already, why not just use a Figma to Code tool?
It's not that uncommon to have a workflow where the webpage design gets built and negotiated with stakeholders/customers as a series of photoshop images, and when they're approved, it's forwarded to developers to make a pixel-perfect implementation of that design in HTML/CSS.
say you draw up your rough vision of things that you drew up paper, a very simple mock-up. That could be a nice use case.