Hacker News new | ask | show | jobs
by simonw 6 days ago
I just ran one of these locally on a Mac like this:

  uvx litert-lm run \
    --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
    --backend=gpu \
    --prompt="Generate an SVG of a pelican riding a bicycle"
The first time you run that it downloads 3.2GB to ~/.cache/huggingface/hub/models--litert-community--gemma-4-E2B-it-litert-lm

It can handle audio and image input too, which is pretty cool for a 3.2GB model. For images:

  uvx litert-lm run \
    --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
    --backend=gpu --vision-backend gpu \
    --attachment image.jpg --prompt describe
And for audio:

  uvx litert-lm run \
    --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
    --backend=gpu --audio-backend cpu \
    --attachment audio.wav --prompt transcribe
(The pelican is rubbish, but it's only a 3.2GB file so the fact it even outputs valid SVG is impressive to me: https://gist.github.com/simonw/94b318afde4b1ce5ff67d4b5d0362... )
3 comments

Not to mention the text-only 0.8GB version. Just crazy. You can have basic real-time conversations on-device that's video and audio aware now.
I'll be honest with you. My main ask for on device AI is that when I am typing "Going out for a quick j" it corrects to "jog" and not "Jonathan". I don't think it needs that many gigabytes.
Who doesn't enjoy a quick Jonathan now and then.

But seriously, wouldn't productive text on a 90s cell phone pass this test?

The autocomplete of a decade ago is better than what we have now.

It’s harder now because emojis and draw-to-type as well as pen input. We didn’t have these things 14 years ago when “I’ll be right back” could be expanded from “I’ll b ri ba”

0.8GB is for text only. It's more like ~1.1GB if you include video/audio encoder
And your point is what? That’s more than 0.8GB text only if you include more than, text-only?
Their point is that OP used the same dot separated phrase to point out that there's a 0.8GB model and an audio/image model on device. Which reads weird.
Have you seen a 0.8GB model file floating around yet? I couldn't find one earlier.
I think this is the one but it’s 0.8GB VRAM not 0.8GB size.

https://huggingface.co/google/gemma-4-E2B-it-qat-mobile-ct

But they could be cooking up a smaller one because the model card lists the Q_4 quants as being bigger than the mobile or text-only so I think we’ll need to wait for the Q_2_Distilled_Mobile_Textformer version. Still, just amazing work.

Where is it? On ollama I see only the bigger one
I don’t use ollama, can you pull from HF?
Is that actually QAT? the MLX Community models have that in their names, but these don't, and the upload dates don't quite line up.
As an aside uvx is so pleasant to use... I wish Nvidia supported it as first-class rather than making folks jump through Docker hoops.
I wish people would stop using python sure ai.

It's slow and the PKG resolution is way too flat.

What do you use?