Hacker News new | ask | show | jobs
by embedding-shape 209 days ago
Correct me if I'm wrong, but in this demo video of the user instructing the model to use `git bisect` to find a commit (https://storage.googleapis.com/gweb-developer-goog-blog-asse...), doesn't this actually showcase a big issue with today's models?

In the end, the model only ran `git bisect` (if we're to believe the video at least) for various pointless reasons, it isn't being used for what it's usually used for. Why did it run bisect at all? Well, the user asked the LLM to use `git bisect` to find a specific commit, but that doesn't make sense, `git bisect` is not for that, so what the user is asking for, isn't possible.

Instead of the model stopping and saying "Hey, that's not the right idea, did you mean ... ?" so to ensure it's actually possible and what the user wants, the model runs its own race and start invoking a bunch of other git commands, because that's how you'd find that commit the user is looking for, and then finally does some git bisecting stuff just for fun, it had already found the right commit.

I think I see the same thing when letting LLMs code as well. If you give them some work to do that is actually impossible, but the words kind of make sense, and it'll produce something but not what you wanted, I think they're doing exactly the same thing, bypassing what you clearly instructed so they at least do something.

I'm not sure if I'm just hallucinating that they're acting like that, but LLMs doing "the wrong thing" has been hitting me more than once, and imagining something more dangerous than `do a git bisect`, it seems to me like that video is telling us Gemini 3 Pro will act exactly the same way, no improvements on that front.

Also, do these blog posts not go through review from engineering before they're published? Besides the video not really showcasing anything of interest, the prompt itself doesn't make any sense and would have been caught if a engineer who uses git at least weekly reviewed it before.

1 comments

Looks right to me. At t=0:50 it shows other git bisect commands being run. The git biset reset at the end is ending bisection as it's complete.

Video is really a terrible format for terminal demos, you've got to pause it as the screen flashes text faster than you can read...

> Looks right to me. At t=0:50 it shows other git bisect commands being run. The git biset reset at the end is ending bisection as it's complete.

But what is that actually doing? It looks like when it's running the git bisect, it already knows what the commit is, and could have just returned it. The only reason it ran any bisecting at all, was because the user (erroneously) asked it specifically to use git bisect. It didn't have to.