|
|
|
|
|
by MKuykendall
287 days ago
|
|
GPU/CUDA: Yes, but disabled by default for faster builds. To enable: remove LLAMA_CUDA = "OFF" from config.toml and rebuild with CUDA toolkit installed. Rust library: Absolutely! Add shimmy = { version = "0.1.0", features = ["llama"] } to Cargo.toml. Use the inference engine directly: let engine = shimmy::engine::llama::LlamaEngine::new();
let model = engine.load(&spec).await?;
let response = model.generate("prompt", opts, None).await?; No need to spawn processes - just import and use the components directly in your Rust code. |
|