|
|
|
|
|
by psytrx
858 days ago
|
|
In addition to the initial loading time noted by the other posters: You may want to use the same inference engine or even the same LLM for multiple purposes in multiple applications. Also, which is a huge factor in my opinion, is getting your machine, environment and OS into a state that can't run the models efficiently. It wasn't trivial to me. Putting all this complexity inside a container (and therefore "server") helps tremendously, a) in setting everything up initially and b) keeping up with the constant improvements and updates that are happening regularly. |
|