As a freelancer I do a bit of everything, and I’ve seen places where LLM breezes through and gets me what I want quickly, and times where using an LLM was a complete waste of time.
For sure. The more specialized or obscure of things you have to do, the less LLMs help you.
Building a simple marketing website? Probably don’t waste your time - an LLM will probably be faster.
Designing a new SLAM algorithm? Probably LLMs will spin around in circles helplessly. That being said, that was my experience several years ago… maybe state of the art has changed in the computer vision space.
> The more specialized or obscure of things you have to do, the less LLMs help you.
I've been impressed by how this isn't quite true. A lot of my coding life is spent in the popular languages, which the LLMs obviously excel at.
But a random dates-to-the-80s robotics language (Karel)? I unfortunately have to use it sometimes, and Claude ingested a 100s of pages long PDF manual for the language and now it's better at it than I am. It doesn't even have a compiler to test against, and still it rarely makes mistakes.
I think the trick with a lot of these LLMs is just figuring out the best techniques for using them. Fortunately a lot of people are working all the time to figure this out.
Agreed. This sentiment you are replying to is a common one and is just people self-aggrandizing. No, almost nobody is working on code novel enough to be difficult for an LLM. All code projects build on things LLM's understand very well.
Even if your architectural idea is completely unique... a never before seen magnum opus, the building blocks are still legos.
Specialized is probably not the word I'd use, because llms are generally useful to understand more specialized / obscure topics. For example I've never randomly heard people talking about the dicom standard, llms have no trouble with it.
I think there is a sweet spot for the training(?) on these LLMs where there is basically only "professional" level documentation and chatter, without the layman stuff being picked up from reddit and github/etc.
I was looking at trying to remember/figure out some obscure hardware communication protocol to figure out enumeration of a hardware bus on some servers. Feeding codex a few RFC URLs and other such information, plus telling it to search the internet resulted in extremely rapid progress vs. having to wade through 500 pages of technical jargon and specification documents.
I'm sure if I was extending the spec to a 3.0 version in hardware or something it would not be useful, but for someone who just needs to understand the basics to get some quick tooling stood up it was close to magic.
The standard for obscurity is different for LLMs, something can be very widespread and public without the average person knowing about it. DICOM is used at practically every hospital in the world, there's whole websites dedicated to browsing the documentation, companies employ people solely for DICOM work, there's popular maintained libraries for several different languages, etc, so the LLM has an enormous amount of it in its training data.
The question relevant for LLMs would be "how many high quality results would I get if I googled something related to this", and for DICOM the answer is "many". As long the that is the case LLMs will not have trouble answering questions about it either.
One tendency I've noticed is that LLMs struggle with creativity. If you give them a language with extremely powerful and expressive features, they'll often fail to use them to simplify other problems the way a good programmer does. Wolfram is a language essentially designed around that.
I wasn't able to replicate in my own testing though. Do you know if it also fails for "mathematica" code? There's much more text online about that.
> Building a simple marketing website? Probably don’t waste your time - an LLM will probably be faster.
This is actually where I would be most reluctant to use an LLM. Your website represents your product, and you probably don’t want to give it the scent of homogenized AI slop. People can tell.
They can tell if you let it use whatever CSS it wants (Claude will nearly always make a purple or blue website with gross rainbow gradients). They can also tell if you let it write your marketing copy.
If you decide on your own brand colors and wording, there’s very little left about the code that can’t be done instantly by an LLM (at least on a marketing website).
Some subscriptions offer "unlimited tokens" for certain models. i.e. GitHub co-pilot can be unlimited for GPT-4o and GPT-4.1 (and, actually, GPT-5 mini!). So: I spent some time with those models to see what level of scaffolding and breaking things down (hand holding) was required to get them to complete a task.
Why would I do that? Well, I wanted to understand more deeply how differences in my prompting might impact the outcomes of the model. I also wanted to get generally better at writing prompts. And of course, improving at controlling context and seeing how models can go off the rails. Just by being better at understanding these patterns, I feel more confident in general at when and how to use LLMs in my daily work.
I think, in general, understanding not only that earlier models are weaker, but also _how_ they are weaker, is useful in its own right. It gives you an extra tool to use.
I will say, the biggest findings for "weaknesses" I've found are in training data. If you're keeping your libraries up-to-date, and you're using newer methods or functionality from those libraries, AI will constantly fail to identify with those new things. For example, Zod v4 came out recently and the older models absolutely fail to understand that it uses some different syntax and methods under the hood. Jest now supports `using` syntax for its spyOn method, and models just can't figure it out. Even with system prompts and telling them directly, the existing training data is just too overpowering.
I would say they are not changing but evolving and you evolve with them.
For example: gemini became a lot better in a lot more tasks. How do I know? because i also have very basic benchmarks or lets say "things which haven't worked" are my benchmark.
Honestly I think this is the primary explanation for why there is so much disagreement on if LLMs are useful or not. If you leave out the more motivated arguments in particular.
Building a simple marketing website? Probably don’t waste your time - an LLM will probably be faster.
Designing a new SLAM algorithm? Probably LLMs will spin around in circles helplessly. That being said, that was my experience several years ago… maybe state of the art has changed in the computer vision space.