|
Correct me if I'm wrong, but in this demo video of the user instructing the model to use `git bisect` to find a commit (https://storage.googleapis.com/gweb-developer-goog-blog-asse...), doesn't this actually showcase a big issue with today's models? In the end, the model only ran `git bisect` (if we're to believe the video at least) for various pointless reasons, it isn't being used for what it's usually used for. Why did it run bisect at all? Well, the user asked the LLM to use `git bisect` to find a specific commit, but that doesn't make sense, `git bisect` is not for that, so what the user is asking for, isn't possible. Instead of the model stopping and saying "Hey, that's not the right idea, did you mean ... ?" so to ensure it's actually possible and what the user wants, the model runs its own race and start invoking a bunch of other git commands, because that's how you'd find that commit the user is looking for, and then finally does some git bisecting stuff just for fun, it had already found the right commit. I think I see the same thing when letting LLMs code as well. If you give them some work to do that is actually impossible, but the words kind of make sense, and it'll produce something but not what you wanted, I think they're doing exactly the same thing, bypassing what you clearly instructed so they at least do something. I'm not sure if I'm just hallucinating that they're acting like that, but LLMs doing "the wrong thing" has been hitting me more than once, and imagining something more dangerous than `do a git bisect`, it seems to me like that video is telling us Gemini 3 Pro will act exactly the same way, no improvements on that front. Also, do these blog posts not go through review from engineering before they're published? Besides the video not really showcasing anything of interest, the prompt itself doesn't make any sense and would have been caught if a engineer who uses git at least weekly reviewed it before. |
Video is really a terrible format for terminal demos, you've got to pause it as the screen flashes text faster than you can read...