Hacker News new | ask | show | jobs
by Jasssss 33 days ago
Nice! Mistral 7B v0.1 is sliding_window: 4096 in the HuggingFace config.json (though v0.2 sets it to null). Gemma 2 alternates sliding window (4096) and full attention every other layer. Both have the field in the model config so maybe you could pull it from the same API you're already using.