| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alexawarrior3 856 days ago

None of these I've seen actually works in practice. Having used LLMs for software development the past year or so, even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work. I've got to go back and forth with the LLM to get anything useful and even then have to substantially modify it. I really hope there are some big advancements soon and this doesn't just collapse into another AI winter, but I can easily see this happening.

Some recent actual uses cases for me where an agent would NOT be able to help me although I really wish it would:

1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.

2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.

3. Illustrator Agent - Image generation for above story. Images end up very "LLM" looking, often miss key elements in the story, but one thing is worst of all: no persistent characters. This is already a big problem with text, but an even bigger problems with images. Every image for the same story has a character who looks different, but I want them to be the same.

4. Publisher Agent - Package things together above so I can get a complete package of illustrated stories on topics available on web/mobile for viewing, tracking progress, at varying levels.

Just some examples of where LLMs are currently not moving the needle much if at all.

6 comments

chenxi9649 856 days ago

>even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work

This is certainly true for more complex code generation. But there are a lot of "rote" work that I do use GPT to generate, and I feel like those have really improved my productivity.

The other use case for AI-assisted coding is that it _really_ helps me learn certain stuff. Whether it's a new language, or code that someone else wrote. Often times I know what I want done, but I don't know the corresponding utility functions in that language, and AI will not only be able to generate it for me but also through the process teach me about the existence of those things.(some of which are wrong lol, but it's correct enough for me to keep that behavior)

link

okwhateverdude 856 days ago

> 2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.

You have to break it down into smaller steps and provide way more detail than you think you do in the context. I did an experiment in story generation where I had "authors" that would write only from the perspective of one of the characters that was also completely generated starting first from genre, name, character traits, etc. Then for a given scene, within a given plot and where in the story you are, randomly rotate between authors for each generation, appending it in memory, but not all of the story fits in context. And each generation is only a couple hundred tokens where you ask it to start/continue/end the story. The context contains all of this information in a simple key:value format. And essentially treat the LLM like a loom and spin the story out.

Usually what it produces isn't quite the best, but that's okay, because you can further refine the generation by using different system/user prompts explicitly for editing the content. I found that asking it to suggest one refinement and phrase it as a direct command, then feeding that command with the original generation, works. This meta-prompting tends to produce changes that subjectively improve the text according to whatever dimensions specified in the system prompt.

If you treat the composition as way more mechanical with tightly constrained generation, you get a much better, much more controlled result.

link

Kerbonut 855 days ago

> 1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.

That’s because none of the models have been trained on this. Create a dataset for this and train a model to do it and it will be able to do it.

link

carlossouza 855 days ago

https://www.youtube.com/watch?v=bRFLE9qi3t8

Here's the CEO of Builder.io supporting your comment: he says they tried LLMs/agents, and it didn't work. Then, they collected a dataset and developed an in-house model only to assist where they couldn't solve with imperative programming

link

foolswisdom 855 days ago

Not really, he's saying that the solution is to not have the entire process in a single model, it's better to have the model work on specific pieces that you broke down, rather than feeding the whole thing and expecting the model to be able to break it down and generate correctly by itself.

link

EVa5I7bHFq9mnYK 854 days ago

One area that has been useful for me, is writing simple code in languages I am not familiar with, and not willing to learn. For example, I needed to write a small bash script to automate things in Ubuntu, it really saved me time on googling all those commands. Same with Task Scheduler XML language. It knows very well the popular use cases of all the languages.

link

rpmisms 856 days ago

Besides writing boilerplate, I used AI to generate a color scheme and imagery for a charity website I built.

link

da4id 855 days ago

Why do you want it to generate web pages from images? I'm having trouble understanding the workflow here. You see a component you like on another website and want to obtain the code from it? Or if you have a design already, why not just use a Figma to Code tool?

link

PeterisP 855 days ago

It's not that uncommon to have a workflow where the webpage design gets built and negotiated with stakeholders/customers as a series of photoshop images, and when they're approved, it's forwarded to developers to make a pixel-perfect implementation of that design in HTML/CSS.

link

gremlinsinc 855 days ago

say you draw up your rough vision of things that you drew up paper, a very simple mock-up. That could be a nice use case.

link