Hacker News new | ask | show | jobs
by kiitos 376 days ago
Right, so -- 'you think that you're "deciding what gets built and how it's designed" by iterating on the prompts that you feed to the LLM that generates the code'

> My prompts specify very precisely what should be implemented.

And the precision of your prompt's specifications, has no reliable impact on exactly what code the LLM returns as output.

> With the details I provided, combined with the OAuth spec, there was really very little room left for any creativity in the code. It was basically connect-the-dots at that point.

I truly don't know how you can come to this conclusion, if you have any amount of observed experience with any of the current-gen LLM tools. No amount of prompt engineering gets you a reliable mapping from input query to output code.

> I designed the end-to-end encryption scheme and told it in detail how to implement it. I pointed out bugs and explained how to fix them. And so on.

I guess my response here is that, if you think that this approach to prompt engineering gets you a generated code result that is in any sense equivalent, or even comparable, in terms of quality, to the work that you could produce yourself, as a professional and senior-level software engineer, then, man, we're on different planets. Pointing out bugs and explaining how to fix them in your prompts in no way gets you deterministic, reliable, accurate, high-quality code as output. And actually forget about high-quality, I mean even just bare minimum table-stakes requirements-satisfying stuff.. !

1 comments

Nobody has claimed to be getting deterministic outputs from LLMs.
> My prompts specify very precisely what should be implemented. I specified the public API and high-level design upfront. I let the AI come up with its own storage schema initially but then I prompted it very specifically through several improvements (e.g. "denormalize this table into this other table to eliminate a lookup"). I designed the end-to-end encryption scheme and told it in detail how to implement it. I pointed out bugs and explained how to fix them. And so on.

OK. Replace "[expected] deterministic output" with whatever term best fits what this block of text is describing, as that's what I'm talking about. The claim is that a sufficiently-precisely-specified prompt can produce reliably-correct code. Which is just clearly not the case, as of today.

I don't even think anybody expects reliably-correct code. They expect code that can be made as reliably as they themselves could make code, with some minimal amount of effort. Which clearly is the case.
Forget about reliably-correct. The code that any current-gen LLM generates, no matter how precise the prompt it's given, is never even close to the quality standards expected of any senior-level engineer, in any organization I've been a part of, at any point in my career. They very much never produce code that is as good as what I can create. If the LLM-generated code you're seeing passes this level of muster, in your view, then that's really a reflection on your situation(s), and 100% not any kind of truth that you can claim as part of a blog post or whatever...
> The code that any current-gen LLM generates, no matter how precise the prompt it's given, is never even close to the quality standards expected of any senior-level engineer, in any organization I've been a part of, at any point in my career.

You are just making assertions here with no evidence.

If you prompt the LLM for code, and then you review the code, identify specific problems, and direct the LLM to fix those problems, and repeat, you can, in fact, end up with production-ready code -- in less time than it would take to write by hand.

Proof: My project. I did this. It worked. It's in production.

It seems like you believe this code is not production-ready because it was produced using an LLM which, you believe, cannot produce production-ready code. This is a cyclic argument.

> If you prompt the LLM for code, and then you review the code, identify specific problems, and direct the LLM to fix those problems, and repeat, you can, in fact, end up with production-ready code

I guess I will concede that this is possible, yes. I've never seen it happen, myself, but it could be the case, at some point, in the future.

> in less time than it would take to write by hand.

This is my point of contention. The process you've described takes ages longer than however much time it would take a competent senior-level engineer to just type the code from first principles. No meaningful project has ever been bottle-necked on how long it takes to type characters into editors.

All of that aside, the claim you're making here is that, speaking as a senior IC, the code that an LLM produces, guided by your prompt inputs, is more or less equivalent to any code that you could produce yourself, even controlling for time spent. Which just doesn't match any of my experiences with any current-gen LLM or agent or workflow or whatever. If your universe is all about glue code, where typing is enemy no. 1, and details don't matter, then fair enough, but please understand that this is not usually the domain of senior-level engineers.

It's possible kiitos has (or had?) a higher standard in mind for what should constitute a senior/"lead engineer" at Cloudflare and how much they should be constrained by typing as part of implementation.

Out of interest: How much did the entire process take and how much would you estimate it to take without the LLM in the loop?

100% this. I have same proof… In productions… across 30+ services… hourly…
The genetic fallacy is a hell of a drug.