The point is that if the model were really "reasoning", it would fail differently. Instead, what happens is consistent with it BSing on a textual level.