| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ionwake 43 days ago

I have an M4 Macbook Air with 32Gb.

These are my current results for my models:

  ┌──────────────────────┬───────────┬─────────────┐
  │        Model         │   Size    │ Tokens/sec  │
  ├──────────────────────┼───────────┼─────────────┤
  │ gemma-4-e4b-it-mlx   │ ~4B (MLX) │ ~10.5 tok/s │
  ├──────────────────────┼───────────┼─────────────┤
  │ qwen3-8b-uncensor-v2 │ 8B        │ ~6.3 tok/s  │
  ├──────────────────────┼───────────┼─────────────┤
  │ qwen3-14b-uncensored │ 14B       │ ~3.5 tok/s  │
  └──────────────────────┴───────────┴─────────────┘

I seem to be doing ok with the Gemma model for file parsing / handling.

1 comments

ActorNightly 43 days ago

<=10 tok/sec is unusable. You are faster writing the code yourself.

link