Hacker News new | ask | show | jobs
by pbgcp2026 35 days ago
I'm sorry to spoil it for you, but Perl script was able to do all of that like ... 10 years ago? The out-of-the-box Shotwell manages photos quite well without any intelligence. The problem, as people mentioned above, is SOTA models cognitive and tooling abilities. Also, have you noticed as top-end Mac Studios got downgraded recently? They don't want you to have access to frontier models. And you will not have it. See Mythos as Exibit A.
9 comments

The Mac Studio's disappearance is related to the fact that people now want them for the purpose of running local models. Supply and demand. That plus Apple doesn't shift prices for released products, and it essentially became underpriced when large RAM quantities exploded in price. For the price of 512GB of RAM alone you could get an M3 Ultra with 512GB of unified memory in a nice, quiet, and power efficient package. With the RAM you still need to spend a few thousand more on CPU/GPU, power supplies, storage and case.

Also the fact that an M5 version will be coming, and they likely know they are going to sell out on day one (I expect we'll see a price correction from Apple for higher end configs of M5 studios, base price will probably stay the same), so they need to build up stock reserves.

512GB of ram with I think 600GB/s access. It’s the bandwidth that makes the studio killer for inference.
> The out-of-the-box Shotwell manages photos quite well without any intelligence.

This piqued my interest on how it does it and after briefly checking the project it seems it only has two features for automatic photo categorization. 1) it can group photos by date and 2) It has face detection and recognition that uses trained weights (so ML "intelligence").

Immich (server) also has a whole host of ML features for classification as well.

I got away from google images and upload to my own Immich instance.

I also use an open source camera app on fdroid to degoogle that whole path.

> They don't want you to have access to frontier models. And you will not have it. See Mythos as Exibit A.

"They" fully well know that they current frontier model are maybe 6 month ahead of what people will have access to without their control. See Deepseek as Exibit B

The reason you can't run these locally are more with the fact that those mythos sized models require extreme amount of memory and processing power to run at acceptable speeds. And neither you, nor I can afford to pay for those resources to run those models locally. A big reason is that "running locally" means running on your own hardware. And for almost everyone this means "running on hardware that will spent a big portion of its time just sleeping". Because data center and providers have higher utilization rates, they can easily outpace you. That and the fact that when they place an order it's usually for hundreds of thousands of units.

I am convinced the (mainly chinese) open weights models are the only reason OpenAI and Anthropic release at the pace they do. Without them being on their heels, we would have seen a stagnant duopoly in terms of public releases.

That is why the huge lobby machine is grinding away to make those models illegal.

Although, I wonder how many orders of magnitude in terms of affordability the utilization rate actually gets them. Realistically if you use a self-hosted LLM for your job, you might be using it, what, a solid 6 hours per day? Assuming you can keep it actually fed, while working (so, some agentic thing might be necessary, I guess it will need to be more than VSCode autocomplete and responding to individual prompts). Anyway, that starts you out at 1/4’th the utilization, a 4X price increase might be worth paying for privacy and stability (no sudden change in model behavior, no price changes, no days when the system is over-utilized for reasons outside your control).

Rather I think it is just hard for local LLMs to compete in this early stage when the cloud providers are allowed by investors to be unprofitable.

> Realistically if you use a self-hosted LLM for your job, you might be using it, what, a solid 6 hours per day?

You can grow the utilization rate well beyond that if you don't always care about getting a quick, real-time response. (And if you do, then maybe the cloud model was the better deal after all!)

Isn't Mythos that screw up where Anthropic failed to ship something that was no better than the product OpenAI launched a few weeks later?

And, assuming the allegations are true, don't things like Deepseek and Qwen offer existence proofs that frontier models are (and will forever be) trivially distilled down to run domain-specific tasks on boxes that cost a few months of Claude Max subscription?

>Also, have you noticed as top-end Mac Studios got downgraded recently? They don't want you to have access to frontier models. And you will not have it.

Isn't that a function of RAM supply not being available now?

OpenAI did buy out the RAM supply to block competition. Arguably local models are one of its (smaller) competitors.

Even if that weren't the case, every corp _needs_ you to be on a subscription.

They didn't really even buy the RAM. But there's pretty significant demand for RAM in general with data centers being planned left and right.
Do we even have decent OCR nowadays? Any free solutions?
The latest rounds of open weights vision language models are incredibly good. Like, massively good. Open weights vision capabilities trade blows with frontier models. Over the last few months I'd roughly rank capabilities as Gemini -> {chatgpt and SoTa open weights models} -> Claude.

qwen3.5-2b and qwen3.5-4b are great at document parsing. They can run on CPU

qwen3.6-27b and gemma4-31b are borderline better than the human eye in some cases. Their OCR isn't perfect, but they're seriously good. They can still run on the CPU but you'll be waiting minutes per document.

You can demand JSON, YAML, MD, or freeform text just by varying the prompt. Even if you have a custom template, you can just put that in the prompt and they'll do an OK-ish job.

There's also models that aren't in the r/locallama zeitgeist. IBM released a new 4b parameter model for structured text extraction last week, and there's a sea of recent chinese OCR models too.

IMO the open wights models are so good that in a lot of cases it's not worth paying frontier labs for OCR purposes. The only barrier to entry is the effort to set up a pipeline, and havin the spare CPU/GPU capacity.

Many of the open-weights LLMs accept either text or images as input.

Besides those, there are a few smaller open-weights models that are dedicated for OCR tasks, for instance DeepSeek-OCR-2 and IBM granite-vision-4.1-4b. (They can be found on huggingface.co)

The dedicated vision models can be run on much cheaper hardware, including smartphones, than the big models that can process images besides text.

Similarly, besides bigger multimodal models, that can accept audio, images or text as imput, there are smaller open-weights models that are dedicated for speech recognition, e.g. Xiaomi MiMo-V2.5-ASR and IBM granite-speech-4.1-2b.

Depends on your use case. My procuction runs satisfactory on a local docling-serve ( https://github.com/docling-project/docling-serve ), but that is mostly easy relatively clean scans of decently typeset documents with some typical scanning artefacts.
The qwen models not only have good OCR, they will describe pictures to you.
They not not only describe pictures. They can analyze pictures. Detect anomalies. Create 3d models out of it.
Anyone wanna do a quick offline MVP on a general vision assistant for the blind? We've had things like Google Lens for a while, but it's a bit vision and touchscreen-centric.
API for Mythos and GPT Cyber are circulating in the market (That's also why we can use Claude and GPT in China). The open source community has been advancing subscription engineering for a long time, and I don't think Anthropic or OpenAI have any technical advantage in this field.
Huh? Why would Apple not want you to be able to run local models? They have very deliberately stayed the hell away from this space.
The conspiracy angle here is not really relevant. Ram is expensive and they're gearing up for M5 studios. Not the illuminati keeping better LLM models out of your hands.
They did decrease the memory bandwidth for.... reasons... which didn't make much sense.. but yeah this is some pretty weird conspiracy stuff.

Apple doesn't even sell a model. They just have a deal to use Googles. They can't "protect" their cloud version of a model they don't have.

You think Apple doesn't want you to use local models?

That's an interesting way to view the world. I mean, utterly stupid as it is, but interesting.

But the previous sentence is even stupider (a Perl script 10 years ago could write code like Qwen does now?), so I guess at least it's consistent.