It sounds like using less tokens (or, less output due to a more compact syntax) is like a micro-optimization; code should be written for readability, not for compactness. That said, there are some really compact programming languages out there if this is what you need to optimize for.
IMO it's more likely to get confused because there are less unique tokens to differentiate between syntax (e.x. pipe when we want bitwise-or or vice-versa)
I’ve heard it said before on HN that this is not true in general because more tokens in familiar patterns helps the model understand what it’s doing (vs. very terse and novel syntax).
Otherwise LLMs would excel at writing APL and similar languages, but seems like that’s not the case.