|
|
|
|
|
by wtarreau
1138 days ago
|
|
That's very interesting to perform basic tasks at reasonable speeds or to run on smaller systems. Unfortunately it's not of the many ones based on python and transformers, so all gained resources from the compact model are wasted by the heavy engine and ecosystem, and even a 4GB machine with 4G swap goes oom because the loaded data gets duplicated in memory using read() and malloc() :-( Let's wait for someone to port it to a cheaper and more powerful C-based engine like llama-cpp. |
|