Y
Hacker News
new
|
ask
|
show
|
jobs
by
303bookworm
540 days ago
Really excited to see this! 2 Questions: 1. Did you try using RTD (Electra like pretraining)? Or did you skip that for reasons of compatability? 2. Why not incorporate jamba like Mamba2 alternating layers?