| HN Mirror

I don’t see them doing direct sales and it looks like a cloud offerings.

For training the big part of using these things isn’t batching it’s mainly designing the network and cleaning the data and then training it to get results. Training involves batching but it’s already baked in to libraries .

For inference you take the trained model which is huge and load it into memory and then take the model and have it predict output. The design of this architecture is to not use quantization because lower precision means you want to use less memory while this has a huge amount of memory . To handle multiple users requests you don’t do batching a message queue with multiple receivers it copies of the latest trained model would work.