|
|
|
|
|
by echelon
2148 days ago
|
|
I need better error messages, but I believe it should respond with something stating the length is too long. What might've happened is that the instance your request was farmed out to might have been OOM killed. I've provided lots of memory, but these models are pretty massive and each inference run has to spin up a lot of matrices in memory. This is all CPU inference, not GPU. When the pods get OOM killed, they spin up again. The clusters for each speaker are about 5-10 pods apiece (with some double tenancy). |
|