|
|
|
|
|
by npn
7 days ago
|
|
How? edit: now I read the article fully, seems like they utilize some very effective MTP algorithm. and somehow the quality is still decent enough. though, I doubt that the quality really only drip a bit like they claimed. maybe for the benchmarks, but for general uses the heavily quantized models very often so worse result. |
|
- persistent CUDA kernel
- tiled processing with overlapping read/writes
- model designed with specific constraints in mind