|
|
|
|
|
by firebaze
524 days ago
|
|
We made an adapter (a specific CLI interface) for the LLM to interface with the app. Kind of like an integration test, just a little bit more sophisticated. The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does. On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that) If you want to know more details, feel free to drop me a message (username at liku dot social) |
|