Hacker News new | ask | show | jobs
by 3Sophons 883 days ago
The Rust+Wasm stack provides a strong alternative to Python in AI inference.

* Lightweight. Total runtime size is 30MB as opposed 4GB for Python and 350MB for Ollama. * Fast. Full native speed on GPUs. * Portable. Single cross-platform binary on different CPUs, GPUs and OSes. * Secure. Sandboxed and isolated execution on untrusted devices. * Modern languages for inference apps. * Container-ready. Supported in Docker, containerd, Podman, and Kubernetes. * OpenAI compatible. Seamlessly integrate into the OpenAI tooling ecosystem.

Give it a try --- https://www.secondstate.io/articles/wasm-runtime-agi/

2 comments

Interesting. But the gguf file for llama2 is 4.78 GB in size.

For ollama, llama2:7b is 3.8 GB. See: https://ollama.ai/library/llama2/tags. Still I see ollama requires less RAM to run llama 2

Why would anyone downvote this? There is nothing against HN rules and the comment itself is adding new and relevant information.
From the HN Guidelines:

“Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.”

That user almost exclusively links to what appears to be their own product, which is self promotion. They also do it without clarifying their involvement, which could come across as astroturfing.

Self promotion sometimes (not all the time) is fine, but it should also be clearly stated as such. Doing it in a thread about a competing product is not ideal. If it came up naturally, that would be different from just interjecting a sales pitch.

I haven’t downvoted them, but I came close.