Hacker News new | ask | show | jobs
by robgough 20 days ago
As clever as this is, it seems like the names are fairly straightforward (as you'd want!) – did you try using the on-device Apple Foundation model at all? That's actually pretty powerful for a use case like this, and if you're happy to require the user has Apple Intelligence turned on already, your shipped app can end up being tiny. The biggest concern for an app like this is how much RAM you end up using trying to run it. Especially if we end up with lots of different apps all doing the same thing.

Being able to super-power apps with on-device models is a lot of fun. I recently did the same building my own dictation app using small local models, and I still can't believe how effective it is. The download is just 20mb, though it will download parakeet ~475mb for audio, but can use the on-device model as the second-pass LLM and works pretty well (though better models are available to download and use e.g. Llama 3.2 4bit and Qwen 2.5 7B 4bit)

I'm currently building a little tool for a professional photographer friend to go through and classify images in their photoshoots, so I can build a searchable db for them to quickly find very specific images in the future. I simply don't think it would have been possible for me to build a tool like that just a couple years ago at any price.

1 comments

Thanks for the feedback. I did not know my Mac had an on-device Apple Foundation model. Is it multimodal? I'll be checking it out and comparing it with Google Gemma 4. I thought Apple was out of the AI model race.

The idea is to ship more powerful lightweight free models as they become available. I'm looking forward to Gemma 5!

> The biggest concern for an app like this is how much RAM you end up using trying to run it

You are totally right. A new feature for a future version would be to turn off the model when the app is idle. And only launch it next time the user takes a screenshot. It is a trade-off between latency to generate the names and memory RAM.

It's not as powerful as Gemma 4, but I think they likened it to GPT-3. It's perfectly capable of looking at images and classifying them at the level you'll need for this app. And it runs everything on the Apple Neural engines, so decently quick. Of course, this assumes that your users are using Apple Silicon processors, I believe that's the limitation – and they must have enabled Apple Intelligence which downloads the model at that point.