|
|
|
|
|
by gemeral
844 days ago
|
|
> and blowing up parameter count to make up for it based on (an admittedly rapid and indulgent reading of the paper), it seems like they're not increasing the parameter size. Do you mind pointing out where the blowup is occurring? |
|
But since they are (optimized up to 8 or 10x if packing terns beyond 2 bits, in practice it seems 3-5x considering larger other structures needed in memory) more memory efficient, the largest models can be that much larger.