Hacker News new | ask | show | jobs
Ask HN: How is all new software not broken?
1 points by zwilderrr 19 days ago
Google claimed at Google I/O that they shipped "100 features in 100 days" for Antigravity. I can barely get Claude Code to implement a pnl script for historical stock market data without serious bugs or drift. What are they doing differently, or what am I doing wrong?
4 comments

As with many things it matters how you do it.

Some things I do to increase my luck with AI coding: Always Keep your mind engaged in the problem. (Failing to do this is the precursor to project failure)

Start with a plan. Review the plan and really think it through.

Work inside a functional framework or template (don’t reinvent the wheel)

Use collaborative language with your coding assistant.

Work in small chunks. (Build a feature in a feature branch, send it to a colleague for review) Test often. (Ai is good at catching and diagnosing its own mistakes. Have it write lightweight tests and run them automatically)

I have an area of tech that I know backwards and forwards. I can often one-shot a feature or product in that field.

I also branch into areas I don’t know as well. When I’m on its like magic. When it goes wrong I’m an idiot apprentice playing with the master’s tools.

I’ve successfully shipped a dozen projects.

I can often get into flow and ship something in four hours that would have taken me two weeks before.

I have also overstepped and built things I don’t know how to debug (called an expert and he diagnosed and fixed in 30sec)

Google: how many thousands (or millions?) of employees

You: 1 person

Consider the dramatic difference in capacity.

Google has strong software development lifecycle practices. This helps they ensure reliability in the face of unreliable coders.

They publish code style guides. They have a thing called “readability” where employees qualify to be code reviewers by proving they can write maintainable code. They have a strong culture of testing. They have a program called testing on the toilet where they teach you a small idea about software testing while you sit on the toilet. (Yes really)

All these things allow an organization to ship reliable code even when not everyone is brilliant.

This is great for AI which doesn’t write good code. It writes average code 10x faster. SDLC runs that code through the gauntlet and at the end of the day they might be able to ship reliable code 5x faster than they could with human written code.

If you want to succeed you might need to replicate some parts of that lifecycle.

noted. ty
It's uncertain what you are doing wrong.

I work in Big Tech™. I will tell you that they are certainly not lying. Though maybe they are overstating how much velocity they've gotten. Or at least generously attributing llms to a refinement of their pdlc to maximize the output that llms facilitate.

On hackernews, some people would have you believe that llms are essentially useless, produce only garbage, and literally everyone that says they're productive with them is a liar or has ai psychosis. But it's just simply not the case.

In order to have a serious convo about what you're doing wrong, you'd need to describe your workflow and codebase more.

My experience is that it takes sustained use to build an intuition about how to work with them / maximize the value prop of the llm.

I've also noticed that it really is only an extension of how good of an engineer you are. If you start to step outside of your domain, the llm will amplify that lack of knowledge. If you don't know how to assess and steer the output, then it will produce slop.

My workflow is essentially:

1. apply selective context building. Start the session by pointi g out files and concepts within the codebase that are relevant to the task at hand. 2. At the end of that turn, I'll wrote a file that describes the work that I want accomplished. I'll tell the agent, "read this file, create a plan" 3. Usually the agent will have some questions or miss things so I usually answer the questions in the plan file. I'll revise it myself, and add notes 4. Clear context, execute plan

Afterwards, I encapsulate expected flows into end to end or integration tests that exercise the work. Sometimes the final result needs iterating. I'll also write performance tests or visual regression tests depending on the domain.

There's more, but I'll stop riffing. I've found that the above workflow tremendously improves the speed at which I'm able to iterate and ship.

appreciate that. i generally don't do that selective context building step. i instead have claude plan, then run that plan through multiple iterations where i and/or codex reviews the plan and gets it to where it should be. but so often little things slip in that are maddening.

to be fair, i am not doing a line by line code review. for that i also defer to multiple iterations, where i'm reviewing each finding, but my scope is limited to what's surfaced by the agent.

but it's hard to believe big tech is reviewing line by line when they've pushing out thousands of lines.

I personally think they're lying, or at least being very generous with the truth.