Y
Hacker News
new
|
ask
|
show
|
jobs
by
bhadass
141 days ago
Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers:
https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...
1 comments
nextos
141 days ago
Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.
link