Hacker News new | ask | show | jobs
by rq1 600 days ago
I did just that actually to:

* build a codegen for Idris2 and a rust RT (a parallel stack "typed" VM)

* a full application in Elm, while asking it to borrow from DT to have it "correct-by-construction", use zippers for some data structures… etc. And it worked!

* Whilst at it, I built Elm but in Idris2, while improving on the rendering part (this is WIP)

* data collators and iterators to handle some ML trainings with pausing features so that I can just Ctrl-C and continue if needed/possible/makes sense.

* etc.

At the end I had to rewrite completely some parts, but I would say 90% of the boring work was correctly done and I only had to focus on the interesting bits.

However it didn’t deliver the kind of thorough prep work a painter would do before painting a house when asked for. It simply did exactly what I asked, meaning, it did the paint and no more.

(Using 4o and o1-preview)

2 comments

I keep seeing folks who say they’ve built a “full application” using or deeply collaborating with an LLM but, aside from code that is only purported to be LLM-generated, I’ve yet to see any evidence that I can consider non-trivial. Show me the chat sessions that produced these “full applications”.
An application powered by a single source file comprised of only 400 lines of code is, by my definition, trivial. My needs are more complex than that, and I’d expect that the vast majority of folks who are trying to build or maintain production quality revenue generating code have the same or similar needs.

Again, please don’t be offended; what you’re doing is great and I dearly appreciate you sharing your experience! Just be aware that the stuff you’re demonstrating isn’t (hasn’t been, for me at least) capable of producing the kind of complexity I need while using the languages and tooling required in my environment. In other words, while everything of yours I’ve seen has intellectual and perhaps even monetary value, that doesn’t mean your examples or strategies work for all use-cases.

LLMs are restricted by their output limits. Most LLMs can output 4096 tokens - the new Claude 3.5 Sonnet release this week brings that up to 8196.

As such, for one-shot apps like these there's a strict limit to how much you can get done purely though prompting in a single session.

I work on plenty of larger projects with lots of LLM assistance, but for those I'm using LLMs to write individual functions or classes or templates - not for larger chunks of functionality.

> As such, for one-shot apps like these there's a strict limit to how much you can get done purely though prompting in a single session.

That’s an important detail that is (intentionally?) overlooked by the marketing of these tools. With a human collaborator, I don’t have to worry much about keeping collab sessions short—and humans are dramatically better at remembering the context of our previous sessions.

> I work on plenty of larger projects with lots of LLM assistance, but for those I'm using LLMs to write individual functions or classes or templates - not for larger chunks of functionality.

Good to know. For the larger projects where you use the models as an assistant only, do the models “know” about the rest of the project’s code/design through some sort of RAG or do you just ask a model to write a given function and then manually (or through continued prompting in a given session) modify the resulting code to fit correctly within the project?

There are systems that can do RAG for you - GitHub Copilot and Cursor for example - but I mostly just paste exactly what I want the model to know into a prompt.

In my experience most of effective LLM usage comes down to carefully designing the contents of the context.

any code to see? I like Idris so this would be interesting.