If you had to reduce it to one thing, it's probably that language models are capable few shot and zero shot learners. In other words, training a model to simply predict the next word on naturally occurring text, you end up with an tool you can use for generic tasks, roughly speaking.