Y
Hacker News
new
|
ask
|
show
|
jobs
by
blainm
849 days ago
I would be curious to know if anyone has tried a hybrid approach where you have a Mamba-like architecture for longer term recall but it's combined with a transformer for short term memory?
2 comments
logicchains
849 days ago
Yep,
https://arxiv.org/abs/2402.04248
tried a Mambaformer which seemed to perform well.
link
enonimal
849 days ago
maybe a fun karpathy video here...
link