Hacker News new | ask | show | jobs
by lupusreal 266 days ago
That's why you tell claude code to write tests, and use them, use linting tools, etc. And then you test the code yourself. If you're still concerned, /clear then tell claude code that some other idiot wrote the code and it needs to tear it apart and critique it.

Hallucination is not an intractable problem, the stochastic nature of hallucinations makes it easy to use the same tools to catch them. I feel like hallucinations have become a cop out, an excuse, for people who don't want to learn how to use these new tools anyway.

2 comments

> That's why you tell claude code to write tests, and use them

I've seen Python unit tests emitted by LLM that, for a given class under test, start with.

    def test_foo_can_be_imported(self):
        try:
            from a.b.c import Foo
        except ImportError:
            self.fail()


    def test_foo_can_be_instantiated(self):
        from a.b.c import Foo
        instance = Foo()
        self.assertNotNull(instance)
        self.assertTrue(isinstance(instance, Foo)

   def test_other_stuff_that_relies_on_importing_and_instantiating_foo(self)
        ...
And I've watched Cursor do multiple rounds of

"1: The tests failed! I better change the code. 2: The tests failed! I better change the tests. GOTO 1"

until it gets passing tests, sometimes by straight out deleting tests, or hardcoding values to make them pass.

So I don't have the same faith in LLM-authored tests as you do.

> I feel like hallucinations have become a cop out, an excuse, for people who don't want to learn how to use these new tools anyway.

I feel like you've taken that attitude so you can dismiss concerns you don't agree with, without having to engage with them. It's disappointing.

> you now have to not only review and double-check shitty AI code, but also hallucinated AI tests too

Gee thanks for all that extra productivity, AI overlords.

Maybe they should replace AI programmers with AI instead?

I said to make the chatbot do it, not to do all the reviewing yourself. You can do manual reviews once it makes something that works. In the meantime, you can be working on something else entirely.
> In the meantime, you can be working on something else entirely.

Like fixing useless and/or broken tests written by an LLM?

(Thank you, AI overlords, for freeing me from the pesky algorithmic and coding tedia so I can instead focus on fixing the mountains of technical debt you added!)