|
|
|
|
|
by tgtweak
620 days ago
|
|
Yeah combining these two would make a lot of sense, there is a big appetite to run larger models - even slower - on clustered hardware. This way you can add compute to speed up the token pace vs adding it just to run the model at all. It is also possible some of these optimizations could help optimize distribution based on latency and bandwidth between nodes. |
|