Are LLM models approaching a saturation point?

LLM benchmarks are normalized between 0 and 100.

The main benchmarks alredy close to 100: - common sense reasoning (WinoGrande) - arithmetic (GSM8K) - multitasking (MMLU) - sentence completion (HellaSwag) - common sense reasoning 'challenge' (ARC)

The only thing is if the Transformers architecture changes or if there are new benchmarks that measure the performance of models in new properties.

What's next? Increasing performance and decreasing token cost has the potential to open up more complex use cases.

This will lead to the emergence of LLM processors and models will run entirely locally. This is a likely development scenario.

Any thoughts?