Hacker News new | ask | show | jobs
by Imnimo 523 days ago
It's tough to judge without seeing examples of the targets and the user photos, but I'm curious if this could be done with just old-school SIFT. If it really is exactly the same image in the in the corpus and on the wall, does a neural embedding model really buy you a lot? A small number of high confidence tie points seems like it'd be all you need, but it probably depends a lot on just how challenging the user photos are.
2 comments

I find a lot of applied AI use-cases to be "same as this other method, but more expensive".
It's often vastly more expensive to inference, but vastly cheaper and faster to train / set up.

Many LLM use cases could be solved by a much smaller, specialized model and/or a bunch of if statements or regexes, but training the specialized model and coming up with the if statements requires programmer time, an ML engineer, human labelers, an eval pipeline, ml ops expertise to set up the GPUs etc.

With an LLM, you spend 10 minutes to integrate with the OpenAI API, and that's something any programmer can do, and get results that are "good enough".

If you're extremely cash-poor, time-rich and have the right expertise, making your own model makes sense. Otherwise, human time is more valuable than computer time.

That was happening even when they were still calling it machine learning in the papers. Longer before that still. It’s the way some people reliably get papers out for better or worse. Find a known phenomenon with existing published methods, use the same dataset potentially using new method of the day, show there’s a little agreement between the old “gold standard” and your method, and boom, new paper for your cv on $hotnewmethod you can now land jobs with. Never mind no one will cite it. That’s not the point here.
Better to spend $100 in op-ex money than spend $1 in cap-ex money reading a journal paper, especially if it lets you tell investors "AI." :p
Your engineers cost <$1/hr and understand journal papers?
The 100-vs-1 is a ratio.
Use cases such as?
I'm in an AI focused education research group, and most "smart/personalized tutors" on the market have similar processes and outcomes as paper flashcards.
From TFA:

> LLMs and the platforms powering them are quickly becoming one-stop shops for any ML-related tasks. From my perspective, the real revolution is not the chat ability or the knowledge embedded in these models, but rather the versatility they bring in a single system.

Why use another piece of software if LLM is good enough?

Performance. A museum visitor may not have a good internet connection, so any solution that involves uploading a photo to a server will probably be (much) slower than client-side detection. There’s a thin line between a magical experience and an annoying gimmick. Making people wait for something to load is a sure way to cross that line.

Also privacy. Do museum visitors know their camera data is being sent to the United States? Is that even legal (without consent) where the museum is located? Yes, visitors are supposed to be pointing their phone at a wall, but I suspect there will often be other people in view.

Cost. Same reason you don't deliver UPS packages with B-2 bombers.
The cost of LLM inference is cheap and will continue to decrease. More traditional methods take up far more of an engineer's time (which also costs money).

If I have a project with a low enough lifetime inputs I'm not wasting my time labelling data and training a model. That time could be better spent working on something else. As long as the evaluation is thorough, it doesn't matter. But I still like doing some labelling manually to get a feel for the problem space.