In this case I used 'usually' because don't remember all details and didn't want to generalize by saying 'always', but also training/benchmarking protocol can be flawed, for example LLM still can solve shallow reasoning problem by memorizing pattern.
That implies - sometimes not. Which would prove at least some reasoning capabilities.