Hacker News new | ask | show | jobs
by ilaksh 922 days ago
Wow. I wish I could get a computer or VM/VPS with this. Or rent part of one. Use it with quantized models and llama.cpp.

Seems like a big part of using these systems effectively is thinking of ways to take advantage of batching. I guess the normal thing is just to handle multiple user's requests simultaneously. But maybe another one could be moving from working with agents to agent swarms.

1 comments

I don’t see them doing direct sales and it looks like a cloud offerings.

For training the big part of using these things isn’t batching it’s mainly designing the network and cleaning the data and then training it to get results. Training involves batching but it’s already baked in to libraries .

For inference you take the trained model which is huge and load it into memory and then take the model and have it predict output. The design of this architecture is to not use quantization because lower precision means you want to use less memory while this has a huge amount of memory . To handle multiple users requests you don’t do batching a message queue with multiple receivers it copies of the latest trained model would work.