|
|
|
|
|
by 3Sophons
883 days ago
|
|
The Rust+Wasm stack provides a strong alternative to Python in AI inference. * Lightweight. Total runtime size is 30MB as opposed 4GB for Python and 350MB for Ollama.
* Fast. Full native speed on GPUs.
* Portable. Single cross-platform binary on different CPUs, GPUs and OSes.
* Secure. Sandboxed and isolated execution on untrusted devices.
* Modern languages for inference apps.
* Container-ready. Supported in Docker, containerd, Podman, and Kubernetes.
* OpenAI compatible. Seamlessly integrate into the OpenAI tooling ecosystem. Give it a try --- https://www.secondstate.io/articles/wasm-runtime-agi/ |
|
For ollama, llama2:7b is 3.8 GB. See: https://ollama.ai/library/llama2/tags. Still I see ollama requires less RAM to run llama 2