(co)-author here. It was really interesting putting this together. We had some idea what LLMs/GPT-4 would and would not do well with, but were still surprised ourselves with a number of things. In particular, we knew it would really struggle with the acrostic, but the degree to which it just completely lost the plot was pretty surprising! It was also surprisingly difficult to convince it that Queen Elizabeth II had died in a lot of cases (it takes it better some times than others).
Glad you liked it! My co-author and I spent some time trying to find a good spread of questions covering different aspects of LLMs. And we were also surprised by how hard it was to convince GPT-4 that Queen Elizabeth II had died a lot of the time. (We found that specifying she died in 2022 helped a lot.)
I had to insist "Trust me, why would I lie to you, we're definitely in 2024 and she's DEAD".
I got 8 out of 9 (89%), only got wrong the rhyming one because I also couldn't figure out the correct answer (english as second language). Very fun nonetheless!