Hacker News new | ask | show | jobs
by girvo 501 days ago
Except when it consistently gets said particular API wrong. I was using it to do basic graphql-yoga setup with R1 and then Claude Sonnet 3.5 and they both output incorrect usage, and got stuck in a loop trying to fix it.

If it can’t do something that basic and that common using a language and toolset with that much training data, then I’m pessimistic personally.

I’m yet to see Copilot be useful for any of my juniors when we pair, it gets in the way far more than it helps and it is ruining their deeper understanding, it seems.

I’ll continue trying to use these tools, but I swear you’re overselling their abilities even still.

2 comments

The way to fix that is to find an example of correct usage of that API and paste that example in at the start of the prompt.

This technique can reliably make any good LLM fluent in an API that it's never seen in its training data.

At this point with Cursor you can have it index the online docs by giving it a base URL and have it automatically RAG the relevant content into the chat (using the @ symbol to reference the docs). Both Windsurf and Cursor also support reading from URLs (iirc Aider does too).

I’ve had better luck with manually including the page but including the indexed docs is usually enough to fix API mistakes.

Begs the question again: if you need to go out of your way to find an example of correct usage of the api to paste into the prompt, why are you even bothering?

I find copilot useful when I already know what I want and start typing it out, at a certain point the scope of the problem is narrowed sufficiently for the LLM to fill the rest in. Of course this is more in line of “glorified autocomplete” than “replacing junior devs” that a keep hearing claims of.

"if you need to go out of your way to find an example of correct usage of the api to paste into the prompt, why are you even bothering?"

Because it's faster.

Here's an example: https://tools.simonwillison.net/ocr

That's an entirely client-side web page you can use to open a PDF which then converts every page to an image (using PDF.js), then runs each image through the Tesseract.js OCR program and lets you copy out the resulting text.

I built the first version of that in about 5 minutes while paying attention to a talk at a conference, by pasting in examples of PDF.js and Tesseract.js usage. Here's that transcript: https://gist.github.com/simonw/6a9f077bf8db616e44893a24ae1d3...

I wrote more about that process here, including the prompts I used: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

That's why I'm bothering: I can produce useful software in just a few minutes, while only paying partial attention to what the LLM is doing for me.

That's a nice little self contained example. I have yet to see this approach work for the day job: a larger codebase with complex inter-dependencies, where the solution isn't so easily worded (make the text box pink) and where the resulting code is reviewed and tested by one's peers.

We actually had to make a rule at work that if you use an LLM to create an PR and can't explain the changes without using more LLMs, you can't submit the PR. I've seen it almost work - code that looks right but does a bunch of unnecessary stuff, and then it required a real person (me) to clean it up and ends up taking just as much time as if it were just written correctly the first time.

That's one of my personal rules for LLM usage too: "Don't commit code you couldn't explain to someone else" - https://simonwillison.net/2024/Jul/14/pycon/#pycon-2024.062....
It's faster if all you're concerned with can fit in a static html file but what about for more complex projects?

I've struggled with getting any productivity benefits beyond single-file contexts. I've started playing with aider in an attempt to handle more complex workflows and multi-file editing but keep running into snags and end up spinning my wheels fighting my tools instead of making forward progress...

Because it still takes 5 mins for it to output the minimum viable change whereas it’d take me an hour
Yeah thats the trick I've been using too, but by that point I get a better result by implementing it myself... of course, I've had two decades of practice and I don't have to communicate what I want lossily to myself, so it's an unfair comparison, but perhaps I've just not found the right use-case yet. I'm sure it exists, I've just not had much luck over the past couple years yet (including just this past weekend).
That is far more likely to happen when it is relying on compressed knowledge of documentation and usage for an API it would have seen (comparatively) only a few times in training. That is where the various types of memory, tool calling and supplementary materials being fed in can make them significantly more situationally useful.

The LLMs you mention are first and foremost a “general knowledge” machine rather than a domain expert. In my opinion, Junior developers are the least likely to benefit from their use because they have neither the foundational understanding to know when the approach is wrong, nor the practical experience to correct any mistakes. An LLM can replace a junior dev because we expect the mistakes and potentially poor quality, but you don’t really want a junior developer doing code reviews for another junior developer before pushing code.

The expectation for junior devs will probably change as well and they’d do a lot more shadowing while learning the product. Experience is gained in time.