Hacker News new | ask | show | jobs
by thefourthchime 583 days ago
For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another. It's enabling me to use frameworks I barely know, like React Native, Swift, etc..

The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.

I recommend git branches to keep track of this. Keep a good working copy in main, and anytime you want to add a feature, make a branch. If you get it almost there, make another branch in case it goes sideways. The biggest issue with developing like this is that you are not a coder anymore; you are a puppet master of a very smart and sometimes totally confused brain.

5 comments

> For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another.

This is one fact that people seem to severely under-appreciate about LLMs.

They're significantly worse at coding in many aspects than even a moderately skilled and motivated intern, but for my hobby projects, until now I haven't had any intern that would even as much as taking a stab at some of the repetitive or just not very interesting subtasks, let alone stick with them over and over again without getting tired of it.

It also reduces the knowledge needed. I don't particularly care about learning how to setup and configure a web extension from scratch. With LLM, I can get 90% of that working in minutes, then focus on the parts that I am interested in. As somebody with ADHD, it was primarily all that supplementary, tangential knowledge which felt like an insurmountable mountain to me and made it impossible to actually try all the ideas I'd had over the years. I'm so much more productive now that I don't have to always get into the weeds for every little thing, which could easily delay progress for hours or even days. I can pick and choose the parts I feel are important to me.
> It also reduces the knowledge needed. I don't particularly care about learning how to setup and configure a web extension from scratch. With LLM, I can get 90% of that working in minutes, then focus on the parts that I am interested in.

Eh, I would argue that the apparent lower knowledge requirement is an illusion. These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output. If you've ever participated in a code review, you'll know that doing that takes much more effort than actually writing the code yourself.

Not only that, but relying on these tools handicaps you into not actually learning any of the technologies you're working with. If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again, and good luck if that's a critical production issue. If instead you take the time to read the documentation and understand how to use the technology, perhaps even with the _assistance_ of an AI tool, then it might take you more time and effort upfront, but this will pay itself off in the long run by making you more proficient and useful if and when you need to work on it again.

I seriously don't understand the value proposition of the tools in the current AI hype cycle. They are fun and useful to an extent, but are severely limited and downright unhelpful at building and maintaining an actual product.

[1]: https://openai.com/index/introducing-simpleqa/

> These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output.

Definitely, but what LLMs provide me that a purely textual interface can't is discoverability.

A significant advantage of GUIs is that I get to see a list of things I can do, and the task becomes figuring out which ones are going to solve my problem. For programming languages, that's usually not the case (there's documentation, but that isn't usually as nested and context sensitive as a GUI is), and LLMs are very good at bridging that gap.

So even if an LLM provides me a broken SQL query for a given task, more often than not it's exposed me to new keywords or concepts that did in fact end up solving my problem.

A hand-crafted GUI is definitely still superior to any chat-based interface (and this is in fact a direction I predict AI models will be moving to going forward), but if nobody builds one, I'll take an LLM plus a CLI and/or documentation over only the latter any day.

> OpenAI's flagship models are not even correct 50% of the time[1]

Where does [1] go? In any case, try Anthropic's flagship:

91% > 50.6%

https://aider.chat/docs/leaderboards/#code-refactoring-leade...

> OpenAI's flagship models are not even correct 50% of the time[1]

You're reading the link wrong. They specifically picked questions that one or more models failed at. It's not representative of how often the model is wrong in general.

From the paper:

> At least one of the four completions must be incorrect for the trainer to continue with that question; otherwise, the trainer was instructed to create a new question.

All the projects I've been able to start and make progress in in the past year vs the ten years before that are substantive enough proof for me that you're wrong in pretty much all of your arguments. My direct experience proves statements like "the lower knowledge requirement is an illusion" and "it takes much more effort to review code than to write it" wrong. I do code reviews all the time. I write code all the time. I've had AI help me with my projects and I've reviewed and refactored that code. You're quite simply wrong. And I don't understand why you're so eager to argue that my direct experience is wrong, as if you're trying to gaslight me.

It's quite honestly mystifying to me.

It's simply not the case that we need to be experts in every single part of a software project. Not for personal projects and not for professional ones either. So it doesn't make any sense to me not to use AI if I've directly proven to myself that it can improve my productivity, my understanding and my knowledge.

> If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again

This is proof to me that you haven't used AI much. Because AI has helped me understand things much quicker and with much less friction than I've ever been able to before. And I have often been able to solve things AI has had issues with, even if it's a topic I have zero experience with, through the interaction with the AI.

At some point, being able to make progress (and how that affects the learning process) trumps this perfect ideal of the programmer who figures out everything on their own through tedious, mind-numbing long hours solving problems that are at best tangential to the problems they were actually trying to solve hours ago.

Frankly, I'm tired of not being able to do any of my personal projects because of all the issues I've mentioned before. And I'm tired of people like you saying I'm doing it wrong, DESPITE ME NOT BEING ABLE TO DO IT AT ALL BEFORE.

Honestly, fuck this.

It’s baffling to see all the ignorant answers to this thread, OP. My experience has been similar to yours, and I’ve been pushing complex software to production for the past 20 years.

Feels like a bunch o flat earth arguments; they’d rather ignore evidence (or even try out by themselves) to keep the illusion that you need to write it all yourself for it to be “high quality”.

Or, hey, maybe we've just had different experiences, and are using these tools differently? I even concede that I may not be great at prompting, which could be the cause of my problems.

I'm not arguing that writing everything yourself leads to higher quality. I'm arguing that _in my experience_ a) it takes more time and effort to read, troubleshoot and fix code generated by these tools than it would take me to actually write it myself, and b) that taking the time to read the documentation and understand the technologies I'm working with would actually save me time and effort in the future.

You're free to disagree with all of this, but don't try to tell me my experience is somehow lesser than yours.

Thanks, my guess is that many complaining about the technology haven't honestly tried to embrace it.
I understand your frustration. It's like someone trying to convince me that a red car I'm looking at is actually blue. I know what I'm seeing and experiencing. There's nothing theoretical about it and I have the results right in front of me.
Hey, I'm not trying to gaslight you into anything. I'm just arguing from my point of view, which you're free to disagree with.

You're right that I've probably used these tools much less than you have. I use them ocasionally for minor things (understanding an unfamiliar API, giving me hints when web searching is unhelpful, etc.), but even in my limited experience with current state of the art services (Claude 3.5, GPT-4o) I've found them to waste my time in ways I wouldn't if I weren't using them. And at the end of the day, I'm not sure if I'm overall more productive than I would be without them. This limited usage leads me to believe that the problem would be far worse if I were to rely on them for most of my project, but the truth is I haven't actually tried that yet.

So if you feel differently, more power to you. There's no point in getting frustrated because someone has a different point of view than you.

I'm not frustrated with you but I'll explain why you might be getting get the vibes here.

Its like people are learning about these new things called skis.

They fall on their face a few times but then they find "wow much better than good old snowshoes!"

Of course some people are falling every 2 feet while trying skis and then go to the top of the mountain and claim skis are fake and we should all go back to snowshoes because we don't know about snow or mountains.

They are insulting about it because its important to the ragers that, despite failing at skiing, they are senior programmers and everyone else doesn't know how to compile, test and review code and they must be hallucinating their ski journeys!

Meanwhile a bunch of us took the falls and learned to ski and are laughing at the ragers.

The frustrating thing though is that for all the skiiers we can't seem to get good conversations about how to ski because there is so much raging... oh well.

This desire of deniers to prove to people who actually get tons of benefit of LLMs that they aren't getting it is becoming more ridiculous every time.

"You can't use LLMs for this or that because of this and that!!!".

But I AM using them. Every. Single. Day.

And of course every time such comments get downvoted. Folks, you can downvote as much as you want - I don't give a fuck even if my reputation goes negative. This won't make you right.
Things have improved considerably over the last 3 months. Claude with cursor.ai is certainly over 50%
I haven't used cursor.ai, but Claude 3.5 Sonnet definitely has the issues I'm talking about. Maybe I'm not great at prompting, but this is far from an exact science. I always ask it specific things I need help with, making sure to provide sufficient detail, and don't ask it to produce mountains of code. I've had it generate code that not only hallucinates APIs, but has trivial bugs like referencing undefined variables. How this can scale beyond a few lines of code to produce an actually working application is beyond me. But apparently I'm in the minority here, since people are actually using these tools successfully for just that, so more power to them.
I think it really depends on the language. It generates pretty crap but working python code, but even for SQL it generates really weird crummy code that often doesn't solve the problem.

I find it really helpful where I don't know a library very well but can assess if the output works.

More generally, I think you need to give it pretty constrained problems if you're working on anything relatively complicated.

Where the libraries are new/not known to the LLM yet, I just go find the most similar examples in the docs and chuck them in the context window too (easy to do with aider.) Then say 'fix it'. Does an incredible job.
I'm curious: what do you do when the LLM starts hallucinating, or gets stuck in a loop of generating non-working code that it can't get out of? What do you do when you need to troubleshoot and fix an issue it introduced, but has no idea how to fix?

In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?

More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.

[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule

> what do you do when the LLM starts hallucinating, or gets stuck in a loop of generating non-working code that it can't get out of? What do you do when you need to troubleshoot and fix an issue it introduced, but has no idea how to fix?

Same thing I do without an LLM: I try to fix it myself!

> If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself.

Definitely not in the cases I'm thinking about. This extends from "build me a boilerplate webapp that calls this method every time this form changes and put the output in that text box" (which would take me hours to learn how to do in any given web framework) to "find a more concise/idiomatic way to express this chain of if-statements in this language I'm unfamiliar with" (which I just wouldn't do if I don't much care to learn that particular language).

For the UI/boilerplate part, it's easy enough to tell if things are working or not, and for crucial components I'll at least write tests myself or even try to fully understand what it came up with.

I'd definitely never expect it to get the "business logic" (if you want to call it that for a hobby project) right, and I always double-check that myself, or outright hand-write it and only use the LLM for building everything around it.

> The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%.

What I enjoy most about programming is exactly solving complicated puzzles and fixing gnarly bugs, not doing things that could at least theoretically be abstracted into a framework (that actually saves labor and doesn't just throw it in an unknown form right back at me, as so many modern ones do) relatively easily.

LLMs more often than not allow me to get to these 10% much faster than I normally would.

These two projects were almost entirely written with LLMs:

https://github.com/williamcotton/search-input-query

https://github.com/williamcotton/guish

Both are non-trivial but certainly within the context window so they're not large projects. However, they are easily extensible due to the architecture I instructed as I was building them!

The first contains a recursive descent parser for a search query DSL (and much more).

The second is a bidirectional GUI for bash pipelines.

Both operate at the AST level, guish powered by an existing bash parser.

The READMEs have animated gifs so you can see them in action.

When the LLM gets stuck I either take over the coding myself or come up with a plan to break up the requests into smaller sized chunks with more detail about the steps to take.

It takes a certain amount of skill to use these tools, both with how the tool itself works and definitely with the expertise of the person wielding the tool!

If you have these tools code to good abstractions and good interfaces you can hide implementation details. Then you expose these interfaces to the LLM and make it easier and simpler to build on.

Like, once you've got an AST it's pretty much downhill from there to build tools that operate on said AST.

I think there’s often a disconnect between what lay-people hear when someone says “I built an app using AI” and the reality.

What it seems like a lot people assume the process is that you give the AI a relatively high level prompt that’s a description of features, and you get a back a fully functioning app that does everything you outlined.

In my experience (and I think what you are describing here), is that the initial feature-based prompt will often give you (some what impressively) a basic functioning app. But as you start iterating on that app, the high level feature-based prompts start not working very well pretty quickly. It then becomes more an exercise in programming by proxy — where you basically tell the AI what code to write/what changes are needed at a technical level in smaller chunks, and it saves you a lot of time by actually writing the proper syntax. The thing you still have know how to program to be able to accomplish this — (arguably, you have to be a fairly decent programmer who can already reasonably break down complicated tasks into small understandable chunks).

Furthermore, if you want to AI write good code with a solid architecture you pretty much have to tell it what to do from a technical level from the start — for example, here I imagine the AI didn’t come up with structuring things to work as the AST level on its own — you knew that would give you a solid architecture to build on, so you told it to do that.

As someone whose already a half decent programmer, I’ve found this process to be a pretty significant boon to my productivity, on the other hand beyond the basic POC app, I have a hard time seeing it living up the marketing hype of “Anyone can build an app using AI!” that’s being constantly spewed.

The usual workflow I see skeptic folks take is throw a random sentence and expect the LLM to correctly figure out the end result. And then just keep sending small chunks of code expanding the context with poor instructions.

LLMs are tools that need to be learned. Good prompts aren’t hard, but they do take some effort to build.

I have seen hallucinations in comments more than in code. In some of the code hallucinations, I can correct them myself. The hallucinations are obvious. try without finally blocks etc.

So my workflow is to just review every bit of code the assistant generates and sometimes I ask the assistant (I'm using Cody) to revisit a particular portion of the code. It usually corrects and spits out a new variant.

My experience has been nothing short of spectacular in using assistants for hobby projects, sometimes even for checking design patterns. I can usually submit a piece of code and ask if the code follows a good pattern under the <given> constraints. I usually get a good recommendation that clearly points out the pros and cons of the said pattern.

You may not need that last 10% on a hobby project. If you do and it's insurmountable with AI+you then you're no worse off than when it was insurmountable with just you.

Outside that context, the better way to use the tools is as a superpowered stack overflow search. Don't know how ${library} expects you to ${thing} in ${language}? Rather than just ask "I need to add a function in this codebase which..." and pastes it into your code ask "I need an example function which uses..." and use what it spits out as an example to integrate. Then you can ask "can I do it like..." and get some background on why you can/can't/should/shouldn't think about doing it that way. It's not 100% right or applicable, especially with every ${library}, ${thing}, and ${language} but it's certainly faster to a good answer most of the time than SO or searching. Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.

That's the way I currently use them. But, just like with SO, the code could be outdated and not work with the specific version of the library you're using, or just plain wrong. There's no way to tell it to show you code using version X.Y. The code could even be a mix of different versions and APIs, or the LLM might be trained on outdated versions, etc.

Even worse, the LLM will never tell you it doesn't know the answer, or that what you're trying to do is not possible, but will happily produce correct-looking code. It's not until you actually try it that you will notice an error, at which point you either go into a reprompt-retry loop, or just go read the source documentation. At least that one won't gaslight you with wrong examples (most of the time).

There are workarounds to this, and there are coding assistants that actually automate this step for you, and try to automatically run the code and debug it if something goes wrong, but that's an engineering solution to an AI problem, and something that doesn't work when using the model directly.

> Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.

It's not a couple of minutes, though. How do you know you've reached the limit of what the LLM can do, vs. not using the right prompt, or giving enough context? The answer always looks to be _almost_ there, so I'm always hopeful I can get it to produce the correct output. I've spent hours of my day in aggregate coaxing the LLM for the right answer. I want to rely on it precisely because I want to avoid looking at the documentation—which sometimes may not even exist or be good enough, otherwise it's back to trawling the web and SO. If I knew the LLM would waste my time, I could've done that from the beginning.

But I do appreciate that the output sometimes guides me in the right direction, or gives me ideas that I didn't have before. It's just that the thought of relying on this workflow to build fully-fledged apps seems completely counterproductive to me, but some folks seem to be doing this, so more power to them.

I had a problem like this recently. I was working with a Python library that I had never worked with before, and I was relying heavily on LLMs. I was stuck at a point where no LLM could solve my problem: o1, GPT-4o, Sonnet 3.5, Gemini Pro...

Then I had an idea: as it was a picture animation problem, I asked it to write it in CSS. Then I asked it to translate it to Python. Boom, it worked!

At this moment, I finally realized the value of knowing how to prompt. Most of the time it doesn't make a difference, but when things start to get complex, knowing how to speak with these assistants makes all the difference.

If it gets stuck, I tell it where I think we took a wrong turn. It then recognizes the issue and refactors in a way that for a hobby project I wouldn’t have had the patience for.
If you have the budget, I have also taken a liking to perplexity.ai. I got it free from my school and it basically aggregates searches for me with sources (but be sure to check them since sometimes it reads between the links so to speak). It basically does the Google searching for me and have returned more up to date API info than Claude nor ChatGPT knew about. Then I would let Claude or ChatGPT know about it by copying doc and source code to work from.
That's literally going through the dark maze blindfolded, just bouncing off the walls randomly and hoping you are generally at least moving to your goal.

If software engineering should look like this, oh boy am I happy to be retiring in mere 17 years (fingers crossed) and not having to spend more time in such work. No way any quality complex code can come up from such approach, and people complain about quality of software now .

> The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.

So you're basically bruteforcing development, a famously efficient technique for... anything.

Good luck debugging it on production.
This is such a lazy, pointless comment that doesn't add anything to the conversation. It's also way off base about what LLMs can actually do, and the fact that they're pretty handy for debugging production code too.
I mean i debug code other engineers wrote every single day... being good at that is part of the job. The biggest difference is i never have to deal with the LLM writing parts i don't want it to write.