| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by torben-friis 4 days ago

My career path is suprisingly similar to the author's. Weirdly enough, what he takes as the first pillar to fall is the one I see most undamaged currently.

LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations. They're great at refactoring, translating between languages, tracing bugs on existing code even, but there is always many things subtly wrong iterating and expanding our domain.

This might be because the companies I worked for happen to be tackling complex domains precisely for moat-building reasons. They stay in business explicitly because there's not a book out there you can read to build a clone, the knowhow stays inside.

Also, a fintech whose managers recommend speeding up design docs with AI sounds way too careless to be in the money handling business. It's way, way too easy to end up with millions incorrectly allocated, particularly if you deal with high volumes of small transactions. These bugs are always a bitch to deal with because correcting the logic is just step one, you then have to correct all the wrongly calculated data in immutable DBs, move around the red tape and client comms, and your fix is bound to become a gotcha that new features and observability have to take into account ("remember that there's a bump in the data in february 2 because we had incident X".)

6 comments

odeono 4 days ago

This. Once you're building something that genuinely hasn't been built before, LLMs cannot be trusted with any architectural decisions. I'm building a product based around various physics simulations, so it's purely first principles, but without active research, thinking, and challenging, it produces computational code literally hundreds of orders of magnitude slower WHILE implementing absurd fallbacks and shortcuts that effectively result in a useless calculation.

This is the case perhaps 95% of the time.

Oversight is very important, and architectural thinking cannot yet be outsourced, only execution.

throw-the-towel 4 days ago

How many of us here are building something genuinely new, though?

batshit_beaver 4 days ago

Hopefully everyone? Else your job could have been outsourced or replaced by a junior with access to Google and StackOverflow way before LLMs (it just wasn’t due to zero interest rates and proliferation of bullshit jobs in tech companies).

jckahn 4 days ago

That's not been my experience. Most professional software is just a CRUD app in one form or another.

jimbokun 4 days ago

Yes and it’s probably better for society as a whole that all of those can now be vibe coded by someone who is not a full time developer.

bobro 3 days ago

>literally hundreds of orders of magnitude slower

I'm sure this is just a slip of the tongue (finger), but the idea of being a numerical googol times slower is funny.

lowbloodsugar 4 days ago

Sure. But where do you think AI will be in a year? Or do you think that AI is just an advanced Markov chain? Like “AI will never be able to write code. Ok AI will never be able to debug code. ok ai will never be able to write design docs. Ok AI will never be able to architecture. Ok ai will never be able to do distributed systems architecture. Ok ai will never be able to design new products completely from scratch. Ok AI will never be able to run a company. Ok AI will never be able to run a city. Ok AI will never be able to run a government. Ok ai will never be able to run the world economy…” It’s Robin Williams Gaddadi sketch “Ok you cross this line you die!” [1].

[1] https://youtu.be/GCOOOyuTBzA?si=YnCWH9LJqb_yYolG

anon7725 4 days ago

And to close the loop - there is no architectural thinking without experience in execution. The highly productive people who are all-in on agentic coding today are powered by their previous experience doing implementation. As time goes on their powers will wane unless they make a point to keep them sharp by doing enough hands-on implementation.

It’s the same as a “non-coding architect” role (remember those). Most of them are absolutely full of shit architecture astronauts.

ex-aws-dude 4 days ago

I find that too, Claude Code is constantly trying to break the architecture patterns and do hacky stuff

Like its only focused on solving the local problem as easy as possible

physicsguy 4 days ago

I have had similar when trying it too. I couldn't even drive Claude Opus 4.7 to get PETsc to compile properly (with all the optional dependencies)

mellosouls 4 days ago

LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

This is domain expertise - software engineers are not needed for that. Ofc often senior sws are expert in it, but they aren't necessary.

Traditionally its been useful for frictionless production to have engineers to be able to do maybe 90% of their work without consulting the business experts but this is the whole crux of the moment TFA discusses - "tradition" is over.

In this new world its now the job of a senior engineer not to have this domain expertise themselves, but to know how to ensure the agents have it, or can acquire it and it be verifiably correct.

Senior engineers who hang on to the idea that their advanced business domain expertise makes them safe will soon be as dead in the water as juniors who haven't pivoted.

torben-friis 4 days ago

>This is domain expertise - software engineers are not needed for that. Ofc often senior sws are expert in it, but they aren't necessary.

Our engineers frequently need to be on the loop with product and stakeholders: Due to real world messiness, many times the only true answer to "how does this currently work" is in the code. Enabling product and stakeholders to fetch that knowledge would be a giant time saver, so we've experimented with LLMs.

I recommend you try this exercise: place a non technical person in front of a complex business' codebase with an agent in between and get them to extract or shape business knowledge through it.

I'm serious, it's not a rethorical device, genuinely do try with a coworker or a friend. It will teach you a lot seeing how the way they approach the problem is different to yours.

All our attempts failed miserably.

mellosouls 3 days ago

"...place a non technical person..."

I'm not suggesting that and agree it would fail. Engineer expertise is important, but not in the old way.

mikeocool 4 days ago

> This is domain expertise - software engineers are not needed for that.

I want to work with the business domain experts you work with. The ones I’ve worked with are experts in their domain, not modeling that domain in software.

Left to their own devices with Claude Code, they produce some great POCs. Then those POCs buckle under their own weight they pile on contradicting requirements and have opus spinning to fix bugs.

Maybe the models will get good enough to solve for this, but they’re not there yet.

mellosouls 3 days ago

As in my reply to the sibling comment, I am not disputing that engineer expertise remains important; I'm saying the nature of it is now different and will continue to rapidly change in its place in the business stack.

causal 4 days ago

I can't even get Claude or GPT-5 to consistently produce good flows for common use cases, much less domain-specific shit. They have deep vocabulary though, which makes them sound better informed than they are.

They are very good at writing code and debugging visible errors- but that's like 50% the harness.

enraged_camel 4 days ago

>> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

My company also deals with a lot of complex regulations and domain-specific system implementations, which AIs used to struggle with. We were able to solve the problem with well-organized claude.md/agents.md files. On top of that we also implemented supermemory.ai, so newly made decisions are always recalled by AI agents when starting new sessions.

worldthruword 4 days ago

> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

Would a skill which forces you and LLM to reach a shared understanding of the product features and the regulations those features are supposed to capture be of help here? The main idea is we provide documents to the LLM and it asks lot of questions which clear ambiguity and possible misconceptions the LLM might have. I would suggest please take a look at skills. They are really helpful.

https://www.youtube.com/watch?v=6BB6exR8Zd8

rdedev 4 days ago

> The main idea is we provide documents to the LLM and it asks lot of questions which clear ambiguity and possible misconceptions the LLM might have

This kind of works but the difficulty is that you have to be very explicit about everything. It was mentioned in a spec document that a particular excel file is treated as a source of truth throughout the whole company and it is treated as an append only database. The agent still decided to add a check to see if a previous row was modified. It pushed back on its decision when asked why it decided to do so. "What if someone entered it wrong and had to correct it"? Valid question but it's not my teams responsibility to check for it

This check makes sense from a traditional development view point and that's why the agent did it. I would say it's good practice too but it's beyond the scope of the project it was working on. If what you are doing is beyond the norm you have to watch out for things like this

causal 4 days ago

Sure but finding their shortcomings and patching them with skills takes real trial and error. They are incapable of identifying their own shortcomings for you.

athrowaway3z 4 days ago

> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

So there is a spectrum here, and i dont know what i dont know - meaning i can just be wrong. But we're both on that spectrum and are you sure its not a skill issue?

All of the specifics you list seems so fundamental that in similar projects I've inserted them straight into the AGENTS.md or a strong reference and where to look them up.

If you boil it down to it, you're quite literally saying the problem is the LLMs dont have access to a bunch of facts.

torben-friis 3 days ago

>If you boil it down to it, you're quite literally saying the problem is the LLMs dont have access to a bunch of facts.

Well yeah, And our problem with mortality is not having access to a bunch of medical facts :)

Kidding aside, it's a fair question. We have several problems with that approach:

One, today they miss X tomorrow Y. You can iteratively add information and get better, but everyone who's had to keep a large company's documentation updated and consistent knows how absolutely hard of a problem that is. Still, this is not the main issue.

Two, knowledge extraction is not clean. We face this daily. "There was no incident on may 12" could mean any of:

- "There was no incident on may 12"

- "There was an incident I was not aware of"

- "There was an incident, but I'm a contractor who has to pay if there's an incident so I'm not admitting shit"

- "There may have been an incident, who knows, I secretly told chatgpt to handle this task for me"

- "Something went wrong but I don't consider it an incident because that particular error has been popping up every wednesday since I joined the company and I was told to ignore it"

- "there was an incident when I touched something you told me not to touch so I will firmly deny there was an incident"

You won't get the LLM to navigate that human problem. You might think that's tech debt and dysfunctionality, but it is real life. It's the same problem as with self driving cars, it's semi easy until you introduce toddlers running after a ball in the middle of a road, drunk drivers and unfixed potholes.

Three, and this is the main issue, surfacing. Skills, agents, etc work for obvious connections like "I'm writing a test => we test with framework x in a style y". they do not work as well for indirect connections like: "If I correct the amount of these past payments' insterests, for a minority of them it might raise the total amount above a certain threshold where we were supposed to have required extra information due to money laundering regulations, and I need to contact legal to see what we do since it's not possible to request the extra info after the fact"

The problem is that the set of things to potentially surface is giant and LLM's fail miserably at connecting what to surface where. It's what we usually refer to as the "spidey sense"/"shitdar" of senior engs. LLMs might get better with time, but so far the ability isn't there.