|
|
|
|
|
by coder68
300 days ago
|
|
I can confirm that Distillbert has worked well when I have used it for classification, especially on shortish sequences. I'm really interested in trying out ModernBert, or a smaller variant due to the larger context window (8192 tokens). |
|