The comment is bogus empty snark (and factually wrong).
The arguments made (and I use the word arguments loosely):
"Too few trainable_params compared to GTP3".
GTP3 is several orders of magnitude higher than what people train, and so it's a useless comparison. It's like we're comparing a bike to an e-bike, and someone says "yeah, but can the e-bike run faster than a rocket?"
Second argument "Sure, it's faster than a machine that costs 3-4 fives more, but you should instead compare it to a machine that costs even more than that".