Hacker News new | ask | show | jobs
by cpursley 181 days ago
That's neat, using Apple Foundation Models or something else? I'm very curious about how it's determining folder matches (I need to do something for images that are already classified/tagged via FastVLM) in iOS.
1 comments

Not Apple Foundation Models — unfortunately they’re not capable enough (yet) for understanding content and matching it to folders.

I’m using SBERT-style embedding models for the semantic matching, which works very well in practice.

For non-text content, the app also analyzes images (OCR + object recognition) using Apple’s Vision framework. That part is surprisingly powerful, especially on Apple Silicon.

> I need to do something for images that are already classified/tagged via FastVLM

What’s the concrete use case you’re targeting with this?

Classifying real estate / property images. Also using Apple Vision which ain't half-bad for something on device and feeding that metadata along with what FastVLM returns into Foundation model to turn into structured output - trying to see how far a I can push that. But feels pretty limited/dated in term of capabilities vs lead edge models.
I’ve seen a huge advantage in running everything fully local and private. Not sure if that fits your use case, though. Nearly 90% of Floxtop users choose the app mainly for that privacy focus.