Hacker News new | ask | show | jobs
by Zondartul 647 days ago
My hunch is that since LLMs are trained on a per word basis (okay, per-token), vacuus verbosity is overrepresented.

If you have one normal sentence and one overly verbose, the latter will have more tokens and therefore more weight.