Hacker News new | ask | show | jobs
by trextrex 740 days ago
I'm not clear on what advantage this architecture has over mamba/Griffin. They also have the linear scaling, better sequence parallelism and are competitive in performance with transformers.
2 comments

The whole field seems to be having issues with comparisons right now.

We really don't even know how Mamba vs Griffin compare.

state tracking...