|
|
|
|
|
by npodbielski
30 days ago
|
|
Yes, I was thinking about the same approach because I have Strix Halo and it slows down with longer context so context with less than <10k tokens would be achievable this way. If this could be done with small model that have >50tk/s that would be huge. Unfortunately I am caught up right now in other projects at work and otherwise and just tried few dozens of prompts to see if this is even achievable. |
|