I work on a large C++ codebase, with large files. Human developers jump around between files with the Visual Studio fuzzy search, set breakpoints to trace execution in the Debugger, use the IDE's refactoring tools.
Microsoft's answer to this was to just ... expose none of this to their Agent Mode!? Replace the working semantic autocomplete with fucking lies!?
Maybe it's changed, I haven't been paying that much attention after bouncing off of this. I've gotten mild acceleration from using gptel-mode in emacs, manually adding references to context, and having models do various mechanical transformations on code. And I've even had some limited success writing tools for it to do LSP lookups.
It frustrates me too, it really feels like the next breakthrough will be when someone gets agents working "natively" with LSP on large code-bases.
Anthropic added LSP support to claude-code, but the current implementation is worse than useless, because any changes aren't reflected fast enough, so it's constantly working on outdated views / compilation caches, and it gets in a right muddle between its "internal" state / understanding in context, the real-world file, and the LSP.
If it could just leverage LSP to apply refactorings it would be amazing, but it feels like the LSP can't keep up, and I don't know if that's an LSP problem or a claude problem.
So we binned the LSP plugin and we're back to watching a machine find/replace, because while waiting on that is slower than LSP, it's a "Action => Wait" which the tooling understands, while LSP is "Possibly Wait for LSP to catch up => Action" which it doesn't understand nearly as well.
I suspect the LSP plugins also need better skills that pair with them so it reaches for them more often.
It hurts my soul to see it reach for find/replace to rename a class, complete with mistakes made in complex solutions where you might have name clashes in different namespaces. Something the LSP handles without problem, but can trip up an LLM.
I wonder, is the problem here that LSP is updating too slow all the time? Or just that there’s a chance it will update very slow, and you never really know if you’ll hit that chance, so your model always has to do the “long time wait” just in case? It seems like it ought to be possible for LSP to report that it is still processing, in the latter case, somehow…
I'm not an expert, but my reading of the spec is that LSP can handle generic $notifications, but there isn't a specific standard for readiness reporting beyond "Initialize / Initialized", which isn't suitable for monitoring on-going staleness or readiness post-file-detected change, the spec has that as a single first-time initialization.
There are notifications (i.e. `textDocument/didChange` ) that you can send to the LSP to help it along, but again you might end up racing the notification from the client making the change and any file-watchers you might have running.
I suspect the answer will come in the form of some kind of more powerful LSP implementations with generous memory caches so that disk changes are just another buffered input that can be disregarded if already stale, no longer seen as the source of truth, and the LSP becomes the real source of truth, so everything can coordinate through it, operating mostly out of memory.
Another avenue for better success will be more research into faster compilation and better incremental compilation for languages with slower compilation.
Maybe one day we'll even get AI agents directly manipulating syntax trees, and the code to get there being written back as merely a side-effect, but that seems like sci-fi compared to the current state of play. LSP is still very document based, and of course LLMs are also trained on oodles of source.
LSPs only really pro-actively send diagnostics (error/warning/info/suggest[/code action]).
Everything else is responsive; the client asks for symbols in this document, or completion on this line, etc. And if the client is aware of document changes (which are versioned), it should notify of those before requesting new symbols/etc, but that's not difficult.
I don't know that it's mandatory, but I definitely implemented servers so that they would complete processing changed documents before responding to any later requests.
And if it's just the client re-using cached symbols without asking for an update (which should be very fast if nothing has changed); well, that's foolish.
I work in Unity and I got frustrated with Claude constantly doing gross bash/grep/awk/sed/grep nested loops that took forever that I finally described (and had Claude implement and install) a tool that could, in a single pass, gather all this info from a Unity forest of scenes at once and answer all the questions Claude ever wanted to ask about a Unity project in a single pass that takes 50ms instead of 10 30 second iterations. It still took a lot of coaching to get it to actually use this tool, but it seems like I’ve convinced it.
Haha yep I’m experimenting with Unreal engine and Codex and it spent 10 minutes while I was AFK confidently trying to build a scene. I load it up and fall through the world. I say “can’t you write a tool to screenshot so you know you’ve done a reasonable job correctly?” and now it does that.
It reminds me of working with a junior dev and he was pushing his code to dev, then waiting for it to build for every update because he couldn’t get it to build locally. 5 minutes of my time fixing his config surely saved him hours over the project. He wasn’t a bad dev either!
You have to do a lot of the meta thinking for the agents, because they’ll take an “everything looks like a nail if you have a hammer” with their toolkit.
Writing an entire local generated asset pipeline using flux and hunyuan3D-2.1 was a really fun experience. I’ve done software for years but never game dev and it’s just so much fun even if it’s junky little games to impress my kids and get them involved in the creative process.
if it helps, I've found that using context (Claude.md etc) is way less effective for this type of pattern compared to using PreToolHook to capture "bad patterns" and either transparently rewriting them to "do the right thing" if that is possible statically, or if not then rejecting the tool use with a message that tells the agent "how" to use the intended tooling itself.
I wrote a skill and included the instruction in Claude.md to always use this skill if it ever feels like grepping around in Unity files. It still took a lot of reminding it before it did it consistently. I’d interrupt it and say “it looks like you’re using grep for something you have an explicit tool for” and it would go “oh, my bad” and do it right. Took a few days to really sink in.
tool_call is just a fancy wrapper to a black box that executes console commands. Said commands are now the actual backbone of all agentic AI, It feels like the linux people are incredibly vindicated in the single responsibility principle
Codex did take control of chrome to run a skill I’d given it for a website without an API the other day. It can do it but it’s excruciatingly slow compare to the tool calls for sure.
Isn't this pretty much why language models were invented?
Pasting something directly into the chat interface seems weird, but if you could somehow just see where P(token | context) falls off a cliff, that's a pretty good hint that your writing has problem.
What would be a better way to incorporate AI as a spell checker?
In comparison to non-AI traditional tools, AI has the advantage of "understanding" the text, reducing the number of "stupid" mis-corrections. And its spelling correctness is usually already impeccable, so what is there to gain by interfacing it with traditional solutions, and how can it be achieved?
Spellchecking is absolutely not a solved problem. I immediately disable spellchecking on every avenue it tries to approach because managing a bunch of dictionaries on every browser/device/application that has its own spellchecker for some godforsaken reason to not have squigglies spammed over every piece of jargon, slang, and slightly atypical spelling is incredibly annoying. I don't know how effective LLMs are, but it's difficult to imagine they can be worse than the existing regime, which is embarrassingly bad for the decades it's been around.
An interesting idea I saw long ago in some book (I thought it was K&P's "Software Tools," or my second guess was K&R1, but neither of those panned out — a strong Mandela effect) was the clever idea of a whole-document spellchecker that works purely probabilistically, by histograms: you feed it a document, it tallies the trigraphs, and any trigraph that appears only rarely is flagged as a likely typo. This approach lets through unknown-but-realistic words like "antithematory" while flagging unrealistic words like "prisencolinensinainciusol" (because of its unlikely "ciu" and "ius" clusters) and "antthemaory" (because of "ntt" and "aor").
To make this approach work better, feed it a bunch of English text (or whatever language your document is in) before the document you really want to "spellcheck."
Essentially this isn't a spell "checker" so much as a spell "linter" — it looks for antipatterns statistically associated with bugs, and reports the patterns for further investigation.
If anyone knows where this trigraph-based "spellchecker" was first presented, I'd love to find out again.
You used to be able to add your own words to spellcheckers, somehow that went out the window. I rarely see the option for it on a red-lined word now in the context menu, and when it does adding the word seems to make no difference at all.
Human copy editors are less than perfect too. I hired one copy editor who I could not trust to be the last person who touched a document before it went out.
I had a friend who wrote an article for the New York Times: the article made a lot of sense before she submitted it, but it was edited for length and style and it definitely read like a New York Times piece but didn't completely make sense.
Try the LanguageTool. It's now good enough to show smelling pisstakes inside my IDE, including things like missing articles. Without creating tons of visual noise.
I can agree that LLMs might yield better results overall than a standard spellchecker.
If your goal is to check your writing for plausibility and rough grammatical correctness, that's certainly an open problem for deterministic, conventionally-written software tools.
My goal with spell checking is to make sure my occasional mechanical typos while using a desktop computer get caught before someone else has the chance to be annoyed by them.
I don't have an issue with using the wrong word entirely when writing at a computer, so that's not a use case I think about. It does happen when I use a smartphone, due to autocorrect and predictive typing, but that's not a case this Claude skill applies to.
So, for my use case, the ~6 orders of magnitude more energy used to send documents over the network to be hyperchurned on an array of GPUs guzzling electricity is pure waste.
It also makes the whole process orders of magnitude _slower_.
I find that massive waste and slowdown infuriating, even while conceding that it can perhaps deliver a little more value then the deterministic spell-checking algorithms I rely on.
A problem with LLM-based spell checking is that it can alter the actual meaning of the sentence in its quest to improve the spelling. This is a fairly hard problem to fix.
Only if the problem is declared to be whatever it is that spell checkers solve. As the classic joke goes, "Me spell chucker work grate. Need grandma chicken."
>Only if the problem is declared to be whatever it is that spell checkers solve.
The problem being misspelling, hence, "spell checker". Like, this seems pretty straightforward? Grammar checking if you cannot use the language properly is a pretty different problem space, and indeed has long existed and is exposed as a separate thing. And not just in fancy word processors either, if you go to something as simple as macOS TextEdit you'll see separate check boxes for "Check spelling as you type" vs "Check grammar with spelling". If someone wants to try out using LLMs for grammar no problem, but spell checking is purely about the mechanical and, importantly, deterministic aspect of typos or outright non-words.
>As the classic joke goes, "Me spell chucker work grate. Need grandma chicken."
There is a genuine touch of irony/meta in you using that here in this context. That sentence has no misspelled words, and importantly gets across the exact humorous meaning the human who wrote it intended. The joke literally only works because a human was able to make creative use of language. If you had an LLM agent posting for you to HN and it automatically changed that to:
>As the classic joke goes, "My spellchecker works great but could use some grammar checking."
Well, where would the joke be now!? This goes to the exact concern people have with powerful non-deterministic meaning-changing tools replacing deterministic meaning-preserving ones.
I just fed this entire thread (excluding your comment pointing out the joke, and the text mentioning that it was a joke) to an LLM, and it did better than the dictionary spellchecker: corrected one real error, left my "squigglies" alone which was attacked by squigglies with the old-hat spellchecker, and specifically noted, without any prompting in that direction, that it left the joke spelling unchanged. It did not rewrite any sentences. I'm all for determinism where deterministic tools work, but the current implementations are so bad I can't blame people for turning to a non-deterministic program if it's still better on average.
It's not clear whether using "grate" instead of "great" is a grammar mistake or a spelling mistake. I'd argue it's a spelling mistake. The intent was not "my spell checker works a frame of metal bars," it was "my spell checker works well." It just so happens that the misspelled word matches a different word.
An example of a sentence like this with correct spelling but bad grammar would be "my spell checker works good." All of the words are what they're meant to be, but the last word is not the correct part of speech.=
But because computers are good at detecting "this doesn't match any known word" and bad at detecting "this matches a word but isn't the word you meant to use here," we've redefined "spell checking" to mean "find words that don't match any known word."
Your point about the joke is not correct. If I put my comment into ChatGPT and ask for a grammar check, it recognizes that it's a joke with deliberately bad grammar and suggests leaving it alone. If I put my comment into a grammar checker, it flags multiple errors in the joke. And "deterministic meaning-preserving ones"? Traditional spell/grammar checkers may be deterministic, but at no point have they ever been guaranteed to preserve meaning, or even been particularly good at it.
aspell works great. Back in the day I used some IBM employee written software on DOS that was a TSR that would spell check words for you in popular editors. In the 1990s every word processor had a decent spell checker. They all had the ability to add your own additional words.
Strong disagree. One of the core strengths of LLMs from the beginning is that they are very good at NOT changing meaning, as long as your model isn't so small that it starts to get "dumb" and as long as your input fits in the context window. (Two known limitations that aren't always exposed to the end user in poorly-written applications.)
Of course, LLMs are non-deterministic and do occasionally make mistakes, so you have to use them correctly and review their output. You shouldn't paste a doc into the web UI and tell it "fix all the mistakes and write the output to a new file." You should instead have it present each mistake and fix to the user as a diff and let the user approve or deny, either within the application or allowing the user to make their own edits. Never let it "rewrite" the whole document, that's the document-editing equivalent of giving OpenClaw root on your personal computer. Nothing good will come of it.
Classic spell checkers can't detect homophones. E.g. "there" and "their." Grammar checkers can, but at least the ones that I have used also like to change the tone of my writing to sterile corporate PC speak. LLMs used for grammar checking have not, in my experience, meddled with my tone. (Although sometimes they try to admonish me for it!)
> Grammar checkers can, but at least the ones that I have used also like to change the tone of my writing to sterile corporate PC speak.
Most grammar checker packages also include style checking, and the default options tend toward that style (because that’s the big market for them.) Most of them are also configurable, so you can disable style checking entirely while still checking grammar, or tweak which style rules are applied.
>What would be a better way to incorporate AI as a spell checker?
You just don't need AI to do spell checking. It's a waste of energy, bandwidth and tokens. It's like Java Enterprise Fizz-Buzz - 1000x more complicated than it needs to be and complete overkill.
But at least you can tell your manager you're using AI!
AI certainly is the shiny new hammer, and it is tempting to see the world as nails.
Traditional methods might not be perfect, but they also easily fit in the memory of even low power devices. Perhaps it isn't a problem worth burning a dollar of tokens for every spelling mistake.
The fact that it produces correctly spelled words says nothing about it’s ability to find spelling mistakes or to correct them without errors like completely changing the word.
> nobody [wants to use AI] to augment already working solutions
Plenty of people do, but that only produces a blog post that will get you to the front page of HN. If you want VCs to drop $40M on your head, you need to pretend to reinvent the world.
Then, to further appease the rain gods, you need to sue the bloggers on the front page of HN who are challenging your world-changing narrative. Which will, heh, drop you on the front page of HN.
Our community is, literally, eating itself at this point. There was a time when we actually took "make something people want" literally. Now it's just part of the fiction.
I think the bitter lesson is severely misapplied in the current situation: If progress from "just add more resources" is very slow, and a huge amount of money is at stake, continous work on hand-engineering can give a continuous and very valuable competitive advantage.
The labs all seem to be going for AGI through bigger LLMs, and I am reasonably sure that it's not going to happen like that.
> The labs all seem to be going for AGI through bigger LLMs
I don't know if this is still the case. Labs like anthropic and openai are spending a huge amount of their time on custom model wrappers. Something which they used to leave to their customers.
It's never occurred to me to even try getting an LLM to design or layout a circuit for me.
Instead, I have dozens or hundreds of chats in my history where I debate the merits of different parts for different tasks and scenarios, the nuances of decoupling strategies (package size vs deregulation), work out resistor network ratios from the reels I have on hand.
Then being able to feed an LLM a datasheet and have it write a custom driver against the registers I need so that it does exactly what I want without the cognitive overhead of a buggy package with someone else's strong opinions about how a part should be used is amazing.
Frontier models are incredibly good at electronics, and it's got nothing to do with what happens inside the EDA.
Design, no... but I've definitely thought about letting one route traces... while autorouters work, I was hoping Claude could do matched traces better. At the time, it didn't want to generate the kicad pcbnew file though. /shrug
Everyone is different, but board layout is one area where I aggressively don't want any LLM input until such a time as it is as good at board layout as it is at refactoring code.
We're still a ways off from that, and that's likely because board layout requires a much more nuanced perspective of the enclosure shape, power requirements, heat dissipation, RF...
It's really not about placing ICs with caps nearby. I actually really enjoy that part anyhow. That's the fun part!
A few days ago someone on HN commented that a teammate uses Claude to search for text in files on their own computer. Buddy... There's Command-line Tools Can Be 235x Faster Than Your Hadoop Cluster and then there's Command-line Tools Can Be ∞ Faster Than Your AI.
As snark, I've been using the phrase "ask GPT about it" for things that clearly do not need an LLM to be involved. The other day, I was on a zoom call and said it, only to see the present actually doing it. I hope my unmuted laugh wasn't too distracting.
There are many domains where a hybrid of numeric and AI approaches would make sense. For example in those domains where there's already a rich practice of numeric tools such as with IC layout.
If its any consolation: once we've burnt the last crumb of coal, the last drop of oil and last bit of natural gas to fuel the AI overlords, that particular problem will take care of itself.
I work on a large C++ codebase, with large files. Human developers jump around between files with the Visual Studio fuzzy search, set breakpoints to trace execution in the Debugger, use the IDE's refactoring tools.
Microsoft's answer to this was to just ... expose none of this to their Agent Mode!? Replace the working semantic autocomplete with fucking lies!?
Maybe it's changed, I haven't been paying that much attention after bouncing off of this. I've gotten mild acceleration from using gptel-mode in emacs, manually adding references to context, and having models do various mechanical transformations on code. And I've even had some limited success writing tools for it to do LSP lookups.