| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by neuronic 304 days ago

How are you getting these results? Even with grounding in sources, careful context engineering and whatever technique comes to your mind we are just getting sloppy junk out of all models we have tried.

The sketchy part is that LLMs are super good at faking confidence and expertise all while randomly injected subtle but critical hallucinations. This ruins basically all significant output. Double-checking and babysitting the results is a huge time and energy sink. Human post-processing negates nearly all benefits.

Its not like there is zero benefit to it, but I am genuinely curious how you get consistently correct output for a "complicated subject matter like insurance".

5 comments

bdangubic 304 days ago

I genuinely think that biggest issue LLM tools is that most people expect magic because first attempts at some simple things feel magical. however, they take insane amount of time to get expertise in. what is confusing is that I think SWEs spent immense amounts of time in general learning the tools of the trade but this seems to escape a lot of people when it comes to LLMs. on my team, every developer is using LLMs all day, every day. on average based on sprint retros each developer spends no less than an hour each day experimenting/learning/reading… how to make them work. the realization we made early is that when it comes to LLMs there are two large groups:

- group that see them as invaluable tools capable of being an immense productivity multiplier

- group that tried things here and there and gave up

we collectively decided that we want to be in the first group and were willing to put time to be in that group.

danpalmer 304 days ago

I'm persisting, have been using LLMs quite a bit for the last year, they're now where I start with any new project. Throughout that time I've been doing constant experimentation and have made significant workflow improvements throughout.

I've found that they're a moderate productivity increase, i.e. on a par with, say, using a different language, using a faster CI system, or breaking down some bureaucracy. Noticeable, worth it, but not entirely transformational.

I only really get useful output from them when I'm holding _most_ of the context that I'd be holding if writing the code, and that's a limiting factor on how useful they can be. I can delegate things that are easy, but I'm hand-holding enough that I can't realistically parallelise my work that much more than I already do (I'm fairly good at context switching already).

lomase 304 days ago

I have been in teams that do this and in teams that dont.

I have not see any tangible difference in the output of both.

bdangubic 304 days ago

year-over-year we are at around 45% in increased productivity and this trajectory is on an upward slope

danpalmer 304 days ago

How are you measuring increased productivity? Honest question, because I've seen teams claim more code, but I've also seen teams say they're seeing more unnecessary churn (which is more code).

I'm interested in business outcomes, is more code or perceived velocity translating into benefits to the business? This is really hard to measure though because in pretty much any startup or growing company you'll see better business outcomes, but it's hard to find evidence for the counterfactual.

bdangubic 304 days ago

same as we have before LLMs for a decade - story points. we move faster now, we have automated stuff we could never automate before. same project, largely same team since 2016, we just get a lot more shit done, a lot more

gedy 304 days ago

So something like: automate unit tests, where the tests are X points where you'd not have done these before?

Not snarking, but if they are automated away, then isn't this like 0 story points for effort/complexity?

danpalmer 303 days ago

I'm glad you're more productive, although I would question this result both in terms of objectivity (story points are typically very subjective), and in terms of capturing all externalities of the LLM workflow. It's easy to have "build the thing", "fix the thing", "remove tech debt in the thing", "replace the thing" be 4 separate projects, each with story points, where "build the better thing" would have been one, and churn is something that is evidenced with LLM development.

lomase 304 days ago

This reads like the bullshit bulletpoints people write on their CV.

bdangubic 304 days ago

comments like this give me warm and fuzzy feeling that theoretically we compete for same jobs - no worries about job security for forseeable future :)

lomase 304 days ago

Someones ego got hurt.

vivzkestrel 304 days ago

dont you think it would be better off getting that expertise in actual system design, software engineering and all the programming related fields. by involving chat GPT to make code, we ll eventually lose the skill to sit and craft code like we used to do all these years. after all the brain s neural pathways only remember what you put to work daily

caseyf7 304 days ago

Where are you finding the best material for reading/learning?

bdangubic 304 days ago

- everything that simon writes (https://simonwillison.net/)

- anything that goes deep into issues (I seldom read “i love llms” type posts like this is great: https://blog.nilenso.com/blog/2025/09/15/ai-unit-of-work/)

- lots of experimentation - specifically I have spent hours and hours doing the exact same feature (my record is 23 times).

- if something “doesn’t work” I create a task immediately to investigate and understand it. even the smallest thing that bother me I will spend hours to figure out why it might have happened (this is sometimes frustrating) and how to prevent it from happening again (this is fun)

My collegue describes the process as Javascript developer trying to learn Rust while tripping on mushrooms :)

oblio 304 days ago

> Its not like there is zero benefit to it, but I am genuinely curious how you get consistently correct output for a "complicated subject matter like insurance".

Most likely by trying to get a promotion or bonus now and getting the hell out of Dodge before anyone notices those subtle landmines left behind :-)

fn-mote 304 days ago

Cynical, but maybe not wrong. We are plenty familiar with ignoring technical debt and letting it pile up. Dodgy LLM code seems like more of that.

Just like tech debt, there's a time for rushing. And if you're really getting good results from LLMs, that's fabulous.

I don't have a final position on LLM's but it has only been two days since I worked with a colleague who definitely had no idea how to proceed when they were off the "happy path" of LLM use, so I'm sure there are plenty of people getting left behind.

0000000000100 304 days ago

Wow the bad faith is quite strong here. As it turns out, small to mid sized insurance companies have some ridiculously poorly architected front ends.

Not everyone is the biggest cat in town with infinite money and expertise. I have no intention of leaving anytime soon, so I have confidence that the code that was generated by the AI (after confirming with our guy who is the insurance OG) is solid improvement over what was before.

oblio 303 days ago

The bad faith is super strong when it's being swamped by a lot more bad faith driven by greed. I'm not talking about you, but about all these companies with overnight valuations in the billions and their PR machines.

To your example, frankly, I would have started with that very important caveat, of an initial situation defined by very poor quality. It's a very valid angle as a lot of code that's available today is of very low quality and if AI can't take 1/10 or 2/10 and make it 5/10 or 6/10, yes, everyone benefits.

gamblor956 304 days ago

A lot of programmers that say that LLMs are awesome tend to be inexperienced, not good programmers, or just gloss over the significant amount of extra work that using LLMs requires.

Programmers tend to overestimate their knowledge of non-programming domains, so the OP is probably just not understanding that there are serious issues with the LLM's output for complicated subject matters like insurance.

cjbarber 304 days ago

What are you trying to use LLMs for and what model are you using?

0000000000100 304 days ago

Depends a lot. Use it for one off scripts, particularly for anything Microsoft 365 related (expanding Sharepoint drives, analyzing AWS usage, general IT stuff). Where there is a lot of heavy context based business logic it will fail since there’s too much context for it to be successful.

I work in custom software where the gap in non-LLM users and those who at least roughly know how to use it is huge.

It largely depends on the prompt though. Our ChatGPT account is shared so I get to take a gander at the other usages and it’s pretty easy see: “okay this person is asking the wrong thing”. The prompt and the context has a major impact on the quality of the response.

In my particular line of work, it’s much more useful than not. But I’ve been focusing on helping build the right prompts with the right context, which makes many tasks actually feasible where before it would be way out of scope for our clients budgets.

torben-friis 304 days ago

Could you give an example of a prompt?

yeasku 304 days ago

You are a top stackoverflow contributor with 20 years of experience in...

torben-friis 304 days ago

I meant an example of the prompts he was attempting, in case it helped provide advice.