Hacker News new | ask | show | jobs
by brainless 333 days ago
I am using Claude Code full-time for about 6 weeks* with the $20/month subscription. I am trying out building different products from ideas I have already had. It frees me a lot of time to talk about my founder journey.

I have not needed multiple agents or using CC over an SSH terminal to run overnight. The main reason is that LLMs are not correct many times. So I still need time to test. Like run the whole app, or check what broke in CI (GitHub Actions), etc. I do not go through code line by line anymore and I organize work with tickets (sometimes they are created with CC too).

Both https://github.com/pixlie/Pixlie and https://github.com/pixlie/SmartCrawler are vibe coded (barely any code that I wrote). With LLMs you can generated code 10x than writing manually. It means you can also get 10x the errors. So the manual checks take some time.

Our existing engineering practices are very helpful when generating code through LLMs and I do not have mental bandwidth to review a mountain of code. I am not sure if we scale out LLMs, it will help in building production quality software. I already see that sometimes CC makes really poor guesses. Imagine many such guesses in parallel, daily.

edit: typo - months/weeks

5 comments

If there is barely any code in those repos that you wrote, how can you license them under the GPL? You don't hold the copyright for it.

This genuinely isn't an attack, I just don't think you can? The AI isn't granted copyright over what it produces.

I can only talk about the law in England & Wales, but:

For code generated by an LLM the human user would likely be considered the author if you provided sufficient creative input, direction, or modification.

The level of human involvement matters, simply prompting "write me a function" might not be enough, but providing detailed specifications, reviewing, and modifying the output would strengthen the claim.

the Copyright, Designs and Patents Act 1988 (CDPA), Section 9(3) staes, "In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken". This was written before LLM's existed, but recent academic literature has supported this position, https://academic.oup.com/jiplp/article/19/1/43/7485196?login...

However, a comparable situation was tested with Thaler v Comptroller-General, where courts emphasised that legal rights require meaningful human involvement, not just ownership of the AI system. - https://www.culawreview.org/journal/unlocking-the-canvas-a-l... and https://www.whitecase.com/insight-our-thinking/uk-supreme-co...

I do acknowledge there is uncertainty, and this is highlighted here in "The Curious Case of Computer-Generated Works under the Copyright, Designs and Patents Act 1988.", with "section 9(3): the section is either unnecessary or unjustifiably extends legal protection to a class of works which belong in the public domain" - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4072004

Today, I think it's doubtful that a functional application can be entirely vibe coded without decent direction and modification, but I don't think that will always be the case.

At least for art there is already precedent in US courts with someone trying to copyright an image generated by midjourney and it getting revoked in 22, because ai generated art cannot be copyrighted.

for code it hasn't been challenged yet, but I find it doubtful they'd decide differently there

I was reading Doe 1 v. GitHub for my paper. The case involves open source developers suing Github Copilot which were trained on, and generating open source code including code with MIT and AGPL license.

So far, the judge believe that training models on open source code is not a license violation as the code is public for anyone to read, but by "distribution or redistribution" (I assume, of the model's outputs?) it is still up for the court's decision whether that violate the terms of the license, among other laws.

The case is currently moved to Ninth Circuit without a decision in the district court, as there are other similar cases (such as Authors Guild's) and they wanted that the courts would offer a consistent rules. I believe one of the big delay in the case is in damages, which I think the plaintiff tried to ask for details of Microsoft's valuation of GitHub when it was acquired, as GitHub's biggest asset is the Git repositories and may provide a monetary value of how much each project is worth. Microsoft is trying to stall and not reveal this.

Assuming you're referring to Thaler v. Perlmutter, Thaler claimed to the copyright office that the image at issue was "autonomously created by a computer algorithm running on a machine". So the question of "if you claim the LLM did it itself" is settled (shocker, cf. Naruto v. Slater, 888 F.3d 418), but that definitely did not settle "_I_ used the LLM to do it".
Tbf, IANAL and was only repeating what journalists wrote back then. Ultimately, I have no deeper knowledge of the laws in question and thus don't have a qualified opinion on the matter.
Also, if there isn't enough human involvement for the code to be copyrightable, then its basically equivalent to being in the public domain. This is more permissive than any code license (ie GPL), so should be fine no matter what.
I'm not so sure about that.

The legal standards in the United States for software copyrights are Jaslow and Altai, known to Federal courts as SSO [0] and AFC [1], respectively.

These standards consider the overall structure of code as being copyrightable. This means that you can't just rename a bunch of variables and class names. The overall organization of the code is considered an arbitrary expression. Someone would be infringing on copyright if they took your Java code and converted it to Python with different class, variable and function names but kept the same relationships between classes and the same general structure.

So what does this have to do with LLMs? Well, if the author directed the code to be structured in a certain way, directed to create specific APIs, etc, then there is a legal argument that the author has at least copyright over the arbitrary and expressive decisions that were made while building a software system.

[0] https://en.wikipedia.org/wiki/Structure,_sequence_and_organi...

[1] https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...

this is highly speculative IANL
I have not thought about it, it is one of the things on my list. But my understanding was that developers copy code from Stack Overflow, as an example. It is not "my" code but I still am the author. Or lets say I ask my friend to add code and she/he simply passes over the code to me. I author it in my name.

The "barely" part may be important and I would like to know what others are doing.

I don't think you can just willy-nilly copy code from StackOverflow and sign it with your name. It's license forbids it. You also can't just sign your friend's code with your name unless she explicitly gives you permission. In both cases you are not the author of that code.

I get that people do it anyway but I guess it's kind of a grey-area because it's hard to tell after the fact that some snippet has been copied from SO.

I got a patch rejected (rightfully so IMO) a long-time ago from libvirt (RedHat) because I was using (and mentioning) code taken from StackOverflow.
So what would be the status of this code? Nobody holds the copyright for it, so anyone can use it in any way and nobody can sue for anything. It's not GPL, but it sounds pretty open source to me.
Yes, my understanding is that non-humans in the USA, cannot be granted copyright. This puts the work in the public domain, which means it can't be relicensed.

There was a much appealed case of a monkey taking a photo, where it was decided the photo was in the public domain.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

It boiled down to the creator not being a "legal person" and so could not hold copyright.

The real problem for software is where the line is for a "sufficient" transformation from the source material by a human to make it acquire copyright. You can write a Dickens' character derived novel and have copyright in it, but not gain control over those characters as Dickens described them.

Can you buy Jules Verns book, add comments and claim copyright on the whole book?

Claim partial copyright without specifying clearly what exactly?

Absolutely.

People sell annotated Bibles, or Shakespeare etc. You can transform it in to something that can acquire copyright, but it must have an artistic step.

This is a big thing in the fine art world as well, you can take inspiration, you can in some circumstances outright copy, but then you need to transform it sufficiently that it becomes your own art. People argue in front of judges about this stuff, of course.

Verne is a good example too, because if you print an English version, the translator acquires copyright in the translated version.

IANAL but the AI is a tool and presumably the code should be treated as any other auto-generated. Its the product of the tool user.

Unless the product includes code licensed by others, then - like any other repo - I don't see any license issue here.

If you mean there is no insight as to whether licensed code is included, that's one of the constraints of vibe-coding (which people often confuse with AI-assisted coding).

Its the job of the user to check and curate the contributions as they would any third-party human input (eg. via prs). Again though - that's not an AI coding issue, but a human process decision.

No, there is no copyright.

If you tried to sue someone for copyright infringement based on code that an LLM generated for you, you'd be laughed out of court.

But you were the one that used the LLM to generate it, so that’s your code, surely - how would unlicensed use of your code not violate copyright? Why didn’t they ‘just’ use an LLM to generate their own code?
The product is not owned by the tool user.

Use a hammer, you own the output. Use an intern, the intern does.

Of course if you're aren't a person you can't own anything.

Correct. No copyright. No legal teeth to the GPL.

Take whatever you want and relicense it cause it doesn't belong to the "author"

Lolololololol "author"

What else should they do then? GPL is a good call. Most keep the output under their own personal/company IP.
> What else should they do then?

They can say the code is in the public domain.

This is distinct from open source, yes, but in almost all cases less restricted than anything with a (open source or otherwise) license.

My trick is - ask for a plan. Revise the plan. Then ask only to work on a single step of the plan making progress incremental then ask for tests for that step and keep hitch hiking in incremental steps.
After getting lazy and relying on it too much and getting burned. I now try use it as a replacement for typing I have a similar workflow to you. I just go through the plan together, and review what’s happened every few steps and the correct. It’s actually made me a better PR reviewer, I’ve noticed :)
Yes and this also helps to highlight which problems I should be solving and which problems Claude code can solve. Like tbh building efficient data structures is not Claude’s thing he seems happy to just hack together some nonsense that leads to spaghetti being shot into all corners of your repo. But by iteratively building up plans and todo lists i find Claude is able to resist the temptation to hack everything all at once to solve the immediate problem in front of his face.
Looking at ChatInterface.tsx/handleSendMessage in https://github.com/pixlie/Pixlie/commit/3c0bd23ff16c0fcdac80..., I'd have rejected this if it came up in a PR and would not consider this production quality software.

Rewriting it to something sane would be harder and more time consuming than just writing a decent implementation upfront.

This may not be the best example, but it’s worth considering: if the code is never meant to be reviewed by humans, and it’s an ephemeral implantation detail that will only ever be read & edited by machines, do certain traditional measures of software quality even matter anymore inside the module boundary?
If no one checked it, no one is accountable. Is that how you do business? Or is that how only a few corporations are able to do business?
This sounds like a terrible situation to walk into when the pager goes off.

"Oh hey <boss>, can you update the status page to say we can't really understand the code and don't have an eta but but we're trying to talk the ai into correcting whatever the issue is?"

This is an excellent point and one that I am chasing. I do not want software (at least ones I produce) to be inferior than what I would hand code. LLMs give me huge velocity but I am still learning where to put guardrails so it keeps quality to what I would do myself.

Now the critical point: what if my own quality is inferior to many others. I think we will have this issue a lot since LLMs can generated code at 10x speeds. The quality will be what the human operator sets. Or, maybe, more and more tools will adapt best practices as guardrails and do not give humans much option to steer away from them.

'Tools with guardrails' are common. Wordpress, RAD, low-/no-code and so on. A lot of enterprise software is produced in a cycle where software interviews middle managers and writes code that then generates code that becomes part of a system.

Reinventing this space but make it slow and expensive seems like it's not a serious business idea. I believe the business idea behind coding LLM SaaS is actually about looking into other corporations and seeing what they do and how.

What would you change in that function?
It does a lot under a rather weird name. The fetch should be behind a client facade elsewhere, as should the response handler. I'd also shorten the name.

Probably more that would irk me if I looked closely.

You’re missing the point. This is not production quality software. This is mass produced software. You aren’t supposed to look at all the internals, just the really critical parts.
This is a very critical point. I think code generated by LLMs will lower code generation cost, that means mass produced software.

But: if we reduce cost, then can we not add deterministic guardrails in software that are also maintained at LLM speed and cost? This is pretty much what I am trying to understand. Choice of Rust/TypeScript in my projects in very intentional and you may see why.

So, according to you, it's like Wordpress all over again and a lot of work incoming for people that can actually build and maintain software.

Edit: And more critically, a lot of work incoming for people that can teach software development.

Yes. Someone will need to manage the 10-100x mess of LLM spaghetti codebases after they find product-market fit.
I'm actually building something to fix this.

The biggest bottleneck for background agents is code review.

I'm building a tool that can give the first pass so the result of the background agent isn't garbage most of the time.

you just rick rolled me!
I am so sorry, it was not intentional at all. I had asked Claude Code to keep a placeholder video. I did not even know what "rick rolling" means. Just searched and understood what I have done!
Apparently this is typical to LLMs: https://chatgpt.com/s/t_687a285d9f708191a884d2ad39ddcb53

I mean, it's probably the most linked YouTube video by a factor of 100x so it makes sense for it to be hardcoded in the model.