Hacker News new | ask | show | jobs
LLMs Are Complicated Now (ianbarber.blog)
38 points by matt_d 10 hours ago
1 comments

Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.

https://sebastianraschka.com/llm-architecture-gallery/?compa...

If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.

It’s written by AI.