Hacker News new | ask | show | jobs
by somenameforme 760 days ago
Your software (noctie) was pretty fun to play against, but it's probably no more than ~1600 ELO. Having it assess me as 2574 after beating it was nice on the ego, but a bit silly. The comparison to Lichess ratings also seems a bit off. 2574 ELO is probably much higher than 2809 Lichess unless there's been some serious rating deflation going on over there (been a while since I played on that site).

I assume you're using an accuracy correlation, but those fail in lots of situations. For instance if somebody is substantially better than somebody else, they're probably going to have near 0 pawn loss, but that's only because the opponent never posed any problems to them.

Style issue also tend to break these correlations. E.g. - Capablanca was much more accurate, by this metric, than Kasparov. But that's because Capablanca had an extremely solid style. In reality, he would probably not fare well against Kasparov or most modern super GMs, even though many of them are far less accurate on paper.

1 comments

Hi, thanks for trying Noctie!

If you did the rating test, Noctie tries to adapt to your strength while you're playing, so if you play at 2574 level for a while, eventually Noctie will also play at that level. Since Noctie has no idea about your rating when the game starts, it might be that it took some time for the rating to adapt and therefore you found that the AI played much weaker.

The max strength of the AI is about 2700–2800 FIDE (level "Queen 4" inside the app if you have an account).

Per this site, https://chessgoals.com/rating-comparison/#lichessotb, 2809 Lichess is equal to around 2550 FIDE so if that's your lichess rating (wow btw!) maybe Noctie wasn't so far off. (EDIT: Ah I see, that's not your rating, sorry)

Obviously, you might get a different result next time – one game is very little information to make an accurate estimation off of, especially when the AI has to adapt it's playing strength as we go.

I don't use accuracy for the rating estimation BTW, I use custom neural networks that observe patterns in how humans at various rating levels play chess.

I found an interesting example of an equal but opposite problem. After this game:

1. e4 c5 2. Nc3 g6 3. f4 Bg7 4. Nf3 d6 5. Bb5+ Bd7 6. Bc4 Nc6 7. O-O Qb6 8. d3 Nf6 9. e5 Ng4 10. Bxf7+ Kxf7 11. e6+ Bxe6 12. Ng5+ Kf6 13. Nxe6 Kxe6 14. Qxg4+ Kf7 15. f5 Ne5 16. fxg6+ Ke8 17. Qe6 Rf8 18. Nd5 Rxf1+ 19. Kxf1 Qd8 20. Bg5 Nxg6 21. Re1 Kf8 22. Nxe7 Nxe7 23. Bxe7+ Qxe7 24. Qxe7+ Kg8 25. Qxb7 Rf8+ 26. Kg1 Be5 27. Rxe5 dxe5 28. Qd5+ Kh8 29. Qxe5+ Kg8 30. Qxc5 Rf7 31. d4 Rf8 32. d5 Rf5 33. Qe7 Rf7 34. Qe8+ Rf8 35. Qe6+ Kg7 36. d6 Rf6 37. Qe7+ Rf7 38. Qe5+ Kg6 39. h4 Rf5 40. Qe8+ Kf6 41. d7 Rd5 42. d8=Q+ Rxd8 43. Qxd8+ Kf5 44. Qd7+ Kf4 45. Kf2 Ke4 46. Qxh7+ Kd4 47. Qd3+ Kc5 48. Ke3 Kb6 49. Qd6+ Kb7 50. Kd4 Kc8 51. Qe7 Kb8 52. Kc5 Ka8 53. Kc6 a6 54. Qb7#

The LLM has decided I'm rated 1784 in what was probably the most one-sided game I've played against it.

Okay, I played it a few more times and it definitely does not seem to be scaling up properly. Here is a game it was getting outplayed pretty substantially in the opening with equal material, but it severely misplayed even in tactical situations once things started to explode later on when it should have long since ramped up.

---

1. e4 e6 2. d3 c6 3. Nf3 d5 4. Nbd2 Nf6 5. e5 Nfd7 6. Be2 Be7 7. O-O O-O 8. Re1 f6 9. d4 fxe5 10. dxe5 Qc7 11. Bd3 Bc5 12. Nf1 Qb6 13. Qe2 Na6 14. a3 Nc7 15. b4 Be7 16. Bg5 Bxg5 17. Nxg5 h6 18. Nh7 Rf7 19. Bg6 Re7 20. Kh1 Nb5 21. f4 Nf8 22. Nxf8 Kxf8 23. Ng3 Bd7 24. Qg4 Nd4 25. Bd3 Be8 26. c3 Nb5 27. Bxb5 cxb5 28. f5 exf5 29. Nxf5 Rf7 30. e6 Rxf5 31. Qxf5+ Ke7 32. Rf1 Qxe6 33. Rae1 Qxe1 34. Rxe1+ Kd6 35. Qe6+ Kc7 36. Qxd5 Bc6 37. Re7+ Kb6 38. Qc5+ Ka6 39. c4 Rf8 40. h3 Rf1+ 41. Kh2 Rf5 42. cxb5+ Bxb5 43. Re6+ Bc6 44. Rxc6+ b6 45. b5+ Kb7 46. Rc7+ Kb8 47. Rc8+ Kb7 48. Qc7#

---

Interesting! I just played it again and can definitely see what you mean. But it keeps running into the same issue. It ends up giving itself lost positions early on which it's not really capable of defending. I was about to ask why you didn't go the other way (strong at first then gradually handicapping) but on the other hand I've never had anywhere near this much fun playing a bot, and maybe this is part of the reason why?

Well another obvious factor is that it plays in an extremely human-like fashion. I'm a relatively strong player and have been the reason for plenty of (C) labels but I would never, in a million years, think I was playing a bot here. Anyhow, awesome job.