Hacker News new | ask | show | jobs
by groby_b 673 days ago
This is a horrible use case for Windows Recall. Even if we ignore all the privacy implications of having a third party screenshot you every 30 seconds and making the files world readable, it's a bad idea.

Recall has lost a ton of useful metadata you already have - both URL visits and streaming are clearly discernible actions, both at the network stack level, and from your browser history. Throwing that away to trust an LLM to re-infer the same data is both reducing data fidelity and significantly increasing processing cost.

If you want to see this done reasonably well, I'd suggest looking at e.g https://beepb00p.xyz/promnesia.html (which not surprisingly bears a strong similarity to what the article discusses)

LLMs don't add much value here, outside of tightly locked down systems where screenshots are the only way of exporting.

1 comments

Sorry when I said something like Windows Recall, I didn't mean Windows Recall but software with similar capabilities. I think in my mind I was imagining some sort of ongoing screen capture along with a meta prompt or prompts, and some sort of output.

The value the LLM adds is interpreting/processing data without having to tailor input streams. Imagine if formats change, fields get renamed, and so on. The maintenance would be a headache if this was done on a per-service level. I think the reduction in fidelity seems like a reasonable tradeoff, but that's for the user to decide of course along with local/cloud processing and proprietary/open source software.

Even things like invoices from the same service change format over time.

I've been using https://www.manictime.com for maybe close to 20 years now, although not the pro version that offers screenshot recording (curiously the website doesn't mention the existence of a free "standard" license). It records window titles and presence/away times.

A prompt every few minutes that would ask "What are you doing now?" would be interesting to me, as a professional procrastinator. Maybe an even better one would be one that says something like "In the last 10 minutes, you spent 90% of it on Hacker News".

The (non-privacy related) issue is the same - if you resort to screen shots, you've thrown away tons of valuable metadata.

And, as long as there's an API, I am fairly certain that maintaining a compat layer is a lot less work than retuning the LLM when the images change. (And you'll want to adjust your tuning, at least with current SOTA, or your error rate will reach Unpleasantville fairly quickly)

Yes, it seems easier on the face of it - but the reality of building an LLM pipeline will quickly point out a lot of edge cases.