Hacker News new | ask | show | jobs
by raffkede 205 days ago
Seems to be the first model that one-shots my secret benchmark about nested SQLite and it did it in 30s,
1 comments

Out of interest. Does it one shot it every time?
Will try again just tried once in the phone a few hours ago, other models were able to do quite a lot but usually missing some stuff this time it managed nested navigation quite well, lot of stuff missing for sure I just tested the basics with the play button in AI studio
It seems to be that first impression that makes all the difference. Especially with the randomness that comes with llms in general. which maybe explains the 'wow this is so much better' vs the 'this is no better than xxx' commments littered throughout this whole parent post.