| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Eisenstein 625 days ago

You can see here:

                res = rest(ollama, {

                    "model": "llava",

                    "prompt": genprompt(box.name),

                    "images": [box.export()],

                    "stream": False

                })

They are calling the ollama API to run Llava. Llava is a combo of an LLM base model and + vision projector (clip or ViT), and is usually around 4 - 8GB. Since every token generated needs access to all of the model weights, you would have to send 4 - 8 GB through USB with the Coral. Even at a generous 10gbit/s that is 8GB / 1.25GB = 6.4seconds per token. A 150 (short paragraph) generation would be 16minutes.

1 comments

phito 625 days ago

Hm yeah sure, I didn't think of the llm part. I don't think it's really useful tbh.

link