| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SuperV1234 10 days ago

Is that all that Mythos did?

Did it find any real potential issue, optimization/simplification opportunities, or sparked any thought-provoking discussion within your organization?

Or was it purely a net negative experience?

2 comments

margalabargala 10 days ago

Read their comment. It's a negative anecdote surrounded by them using genAI all the time.

You're the only one coming away thinking there was a net negative experience.

link

troupo 10 days ago

In regulated industries none of those matter if the tool invents compliance issues or breaks compliance.

The only thought-ptovoking discussion should be "why the hell do we have this stochastic parrot anywhere near out codebase"

link

bloaf 10 days ago

I think that what technical people fail to understand is that a lot of the time, "compliance" is not the same as a binary compiles/does not compile. For a lot of rules/regulations, compliance means "making enough effort that legal is willing to back you up".

A system which will just randomly decide to give the legal team reasons to not back you up is:

* A system whose output will get brought up in lawsuits and make legal's job harder.

* A system that will make the dev team perpetually chase its tail while it oscillates between the several different valid interpretations of the rules.

link

brookst 10 days ago

Odd take. So if it identified 17 real gaps and helped fix them, the fact it was wrong about one gap, and the appropriate humans caught it and no harm was done, the whole thing is useless?

Not saying that is the situation, I don’t know. But if “one error is too many” is your point of view… do you think the humans in these orgs are 100% perfect 100% of the time?

link

troupo 10 days ago

> So if it identified 17 real gaps and helped fix them, the fact it was wrong about one gap, and the appropriate humans caught it and no harm was done

How many gaps have humans not caught?

> But if “one error is too many” is your point of view

Yes, in regulated industries "one error is too many" is the only right approach.

Yes, humans also make errors, and there you have a range of options: from tracing and finding the causes of the error (and tightening processes) to literally jailing those responsible. Your hallucination machine will happily "identify" 17 gaps, and create 34 more. And no, there are no processes to make it better. The "make no mistakes" incantation will happily be ignored for obvious reasons, regardless of how many forms of it you throw at it.

link

ToValueFunfetti 10 days ago

It doesn't seem like you're engaging with the material circumstances described above. What does it mean for a human to not catch that a part of a codebase is actually compliant with regulations? What does it mean for the hallucination machine to create 34 more gaps when it doesn't appear to have more than read access? How would it not be useful to have a machine that identifies 17 real crimes that your highly regulated business is unintentonally committing even with a 90% false positive rate?

link

troupo 9 days ago

In a regulated industry 90% false positive rate is indistinguishable from 100% failure rate. Hell, in any industry.

You're basically saying "we need human review for literally everything AI outputs because we have no way of saying whether anything it produces is hallucination or not. And since it produces plausible-sounding things really fast, it puts enormous burden on human reviewers".

link

ToValueFunfetti 9 days ago

I just don't understand where your position is coming from here. You can't distinguish between a machine that says "here, look at these 170 results, 10% of them are highly serious problems that you should address, you should have some people look into that" and one that shrugs and says "I dunno, maybe just double check everything"? I assume you've come to this conclusion based on some reasoning, but you're not sharing it in this response AFAICT.

link