Hacker News new | ask | show | jobs
by SuperV1234 10 days ago
Is that all that Mythos did?

Did it find any real potential issue, optimization/simplification opportunities, or sparked any thought-provoking discussion within your organization?

Or was it purely a net negative experience?

2 comments

Read their comment. It's a negative anecdote surrounded by them using genAI all the time.

You're the only one coming away thinking there was a net negative experience.

In regulated industries none of those matter if the tool invents compliance issues or breaks compliance.

The only thought-ptovoking discussion should be "why the hell do we have this stochastic parrot anywhere near out codebase"

I think that what technical people fail to understand is that a lot of the time, "compliance" is not the same as a binary compiles/does not compile. For a lot of rules/regulations, compliance means "making enough effort that legal is willing to back you up".

A system which will just randomly decide to give the legal team reasons to not back you up is:

* A system whose output will get brought up in lawsuits and make legal's job harder.

* A system that will make the dev team perpetually chase its tail while it oscillates between the several different valid interpretations of the rules.

Odd take. So if it identified 17 real gaps and helped fix them, the fact it was wrong about one gap, and the appropriate humans caught it and no harm was done, the whole thing is useless?

Not saying that is the situation, I don’t know. But if “one error is too many” is your point of view… do you think the humans in these orgs are 100% perfect 100% of the time?

> So if it identified 17 real gaps and helped fix them, the fact it was wrong about one gap, and the appropriate humans caught it and no harm was done

How many gaps have humans not caught?

> But if “one error is too many” is your point of view

Yes, in regulated industries "one error is too many" is the only right approach.

Yes, humans also make errors, and there you have a range of options: from tracing and finding the causes of the error (and tightening processes) to literally jailing those responsible. Your hallucination machine will happily "identify" 17 gaps, and create 34 more. And no, there are no processes to make it better. The "make no mistakes" incantation will happily be ignored for obvious reasons, regardless of how many forms of it you throw at it.

It doesn't seem like you're engaging with the material circumstances described above. What does it mean for a human to not catch that a part of a codebase is actually compliant with regulations? What does it mean for the hallucination machine to create 34 more gaps when it doesn't appear to have more than read access? How would it not be useful to have a machine that identifies 17 real crimes that your highly regulated business is unintentonally committing even with a 90% false positive rate?
In a regulated industry 90% false positive rate is indistinguishable from 100% failure rate. Hell, in any industry.

You're basically saying "we need human review for literally everything AI outputs because we have no way of saying whether anything it produces is hallucination or not. And since it produces plausible-sounding things really fast, it puts enormous burden on human reviewers".

I just don't understand where your position is coming from here. You can't distinguish between a machine that says "here, look at these 170 results, 10% of them are highly serious problems that you should address, you should have some people look into that" and one that shrugs and says "I dunno, maybe just double check everything"? I assume you've come to this conclusion based on some reasoning, but you're not sharing it in this response AFAICT.