Hacker News new | ask | show | jobs
by datadeft 460 days ago
The biggest problem what I have with using AI for software engineering is that it is absolutely amazing for generating the skeleton of your code, boilerplate really and it sucks for anything creative. I have tried to use the reasoning models as well but all of them give you subpar solutions when it comes to handling a creative challenge.

For example: what would be the best strategy to download 1000s of URLs using async in Rust. It gives you ok solutions but the final solution came from the Rust forum (the answer was written 1 year ago) which I assume made its way into the model.

There is also the verbosity problem. Calude without the concise flag on generates roughly 10x the required amount of code to solve a problem.

Maybe I am prompting incorrectly and somehow I could get the right answers from these models but at this stage I use these as a boilerplate generator and the actual creative problem solving remains on the human side.

7 comments

Personally I've found that you need to define the strategy yourself, or in a separate prompt, and then use a chain-of-thought approach to get to a good solution. Using the example you gave:

  Hey Chat,
  Write me some basic rust code to download a url. I'd like  to pass the url as an string argument to the file
Then test it and expand:

  Hey Chat,
  I'd like to pass a list of urls to this script and fetch them one by one. Can you update the code to accept a list of urls from a file?

Test and expand, and offer some words of encouragement:

  Great work chat, you're really in the zone today!

  The downloads are taking a bit too long, can you change the code so the downloads are asynchronous. Use the native/library/some-other-pattern for the async parts.

Test and expand...
Whew, that's a lot to type out and you have to provide words of encouragement? Wouldn't it make more sense to do a simple search engine query for a HTTP library then write some code yourself and provide that for context when doing more complicated things like async?

I really fail to see the usefulness in typing out long winded prompts then waiting for information to stream in. And repeat...

A few options.

1. Use TTS and have an LLM clean it up.

2. Use a collection of prompt templates.

I meant VTT, not TTS
I'm going the exact opposite way. I provide all important details in the prompt and when I see that the LLM understood something wrong, I start over and add the needed information to the prompt. So the LLM either gets it on the first prompt, or I write the code myself. When I get the "Yes, you are right ..." or "now I see..." crap, I throw everything away, because I know that the LLM will only find shit "solutions".
This is actually a great approach. Essentially you're using time travel to prevent misunderstandings, which prevents the context from getting clogged up with garbage.
This is the best approach and avoids long context windows that get the LLM confused
I have heard a few times that "being nice" to LLMs sometimes improves their output quality. I find this hard to believe, but happy to hear your experience.

Examples include things like, referring to LLM nicely ("my dear"), saying "please" and asking nicely, or thanking.

Do these actually work?

Well consider it's training data. I could easily see questions on sites like stack overflow having better quality answers when the original question is asked nicely. I'm not sure if it's a real effect or not but I could see how it could be. A rudely asked question will have a lot of flame war responses.
I'm not sure encouragment itself is the performance enhancer, it's more that you're communicating that the model has the right "vibe" of what your end goal is.
I use to do the "hey chat" all the time out of habit and when I thought the language model was something more like AI in a movie than what it is. I am sure it makes no difference beyond the user acting different and possibly asking better questions if they think they are talking to a person. Now for me, it looks completely ridiculous.
I find it really bad for bootstrapping projects such as picking dependencies from rapidly evolving ecosystems or understanding the more esoteric constraints like sqlite's concurrency model.

I'd argue you need to bootstrap and configure your project then allow only narrow access and problems to the llm to write code for - individual functions where your prompt includes the signature, individual tests, etc. Anything else and you really need to invest time in the code review lest they re-configure some of your code in a drastic way.

LLMs are useful but they do not replace procedure.

I agree completely with all you said however Claude solved a problem I had recently in a pretty surprising way.

So I’m not very experienced with Docker and can just about make a Docker Compose file.

I wanted to setup cron as a container in order to run something on a volume shared with another container.

I googled “docker compose cron” and must have found a dozen cron images. I set one up and it worked great on X86 and then failed on ARM because the image didn’t have an ARM build. This is a recurring theme with Docker and ARM but not relevant here I guess.

Anyway, after going through those dozen or so images all of which don’t work on ARM I gave up and sent the Compose file to Claude and asked it to suggest something.

It suggested simply use the alpine base image and add an entry to its crontab, and it works perfectly fine.

This may well be a skill issue but it had never occurred to me to me that cron is still available like that.

Three pages of Google results and not a single result anywhere suggesting I should just do it that way.

Of course this is also partly because Google search is mostly shit these days.

Maybe you would have figured it out if you thought a bit more deeply about what you wanted to achieve.

You want to schedule things. What is the basic tool we use to schedule on Linux? Cron. Do you need to install it separately? No, it usually comes with most Linux images. What is your container, functionally speaking? A working Linux system. So you can run scripts on it. Lot of these scripts run binaries that come with Linux. Is there a cron binary available? Try using that.

Of course, hindsight is 20/20 but breaking objectives down to their basic core can be helpful.

With respect, the core issue here is you lacked a basic understanding of Linux, and this is precisely the problem that many people — including myself – have with LLMs. They are powerful and useful tools, but if you don’t understand the fundamentals of what you’re trying to accomplish, you’re not going to have any idea if you’re going about that task in the correct manner, let alone an optimal one.
Honestly we are headed towards a disturbing height of inefficiency in software. Look at software today, 1000x less efficient than what we had in the 90s. Do businesses care? No, they focus on value. The average user is too stupid to care, even though all their RAM is being sucked up and their computer feels like shit.

The only thing that's keeping us from that hell is the "correct" part. The code is not going to be properly tested or consistent, making it impractical for anything substantial right now.

For Claude, set up a custom prompt which should have whatever you want + this:

"IMPORTANT: Do not overkill. Do not get distracted. Stay focused on the objective."

As I understand 'reasoning' is a very misleading term. As far as I can tell, AI reasoning is a step to evaluate the chosen probabilities. So maybe you will get less hallucinations but it still doesn't make AI smart.
Yeah, "reasoning" just tells the AI to take an extra planning step.

In my experience, before "reasoning" became an option, if you ask it a question that takes a decent amount of thinking to solve, but also tell the model "Just give me the answer", you're FAR more likely to get an incorrect answer.

So "reasoning" just tells the model to first come up with a plan to solve a problem before actually solving it. It generates its own context for coming up with a more complete solution.

"Planning" would be a more accurate term for what LLMs are doing.

What I also notice is that the very easily get stuck on a specific approach to solving a problem. One prompt that has been amazing for this is this:

> Act as if you're and outside observer to this chat so far.

This really helps in a lot of these cases.

Like, dropping this in the middle of the conversation to force the model out of a "local minimum"? Or restarting the chat with that prompt? I'm curious how you use it to make it more effective.
Yeah exactly forcing it out of a "local minimum" is a neat way to describe it. In the middle of the conversation I drop this sometimes. Works wonders. You just have to tell it it's stuck in a loop and it will suddenly pretend (?) to be self aware.
That’s a cool tip; I usually just give up and start a new chat.
I find them very good for debugging also