Hacker News new | ask | show | jobs
by skavi 582 days ago
and the repo for this project: https://github.com/microsoft/BitNet
1 comments

The demo they showed was full of repeated sentences. The 3B model looks quite dense, TBH. Did they just want to show the speed?
3B models, especially in quantized state, almost always behave like this.