Hacker News new | ask | show | jobs
by blainm 849 days ago
I would be curious to know if anyone has tried a hybrid approach where you have a Mamba-like architecture for longer term recall but it's combined with a transformer for short term memory?
2 comments

Yep, https://arxiv.org/abs/2402.04248 tried a Mambaformer which seemed to perform well.
maybe a fun karpathy video here...