Hacker News new | ask | show | jobs
by skydhash 741 days ago
The best compression relies on understanding. What LLM is is mostly data how humans use words. We understand how to make this data (which is a compression of human text) and use it (generate something). AKA it’s “production rules”, but statistical.

The only issue is ambiguity. What can be generated strongly depends on the order of the tokens. A slight variation can change the meaning and the result is worthless. Understanding is the guardrail against meaningless statement and LLMs lack it.

1 comments

You seem to entirely miss how attention layers work...