|
|
|
|
|
by jzapletal
123 days ago
|
|
We built an open-source tool that screenshots your desktop and feeds summaries to Claude/Cursor via MCP. What surprised us: - Cost: $0.0002/screenshot (we budgeted 100x more), guess cloud vision APIs got cheap fast - CPU: 5% (exp. 50%) and laptop stays cool - Quality: night and day vs local models, we tried running vision locally first and it was mediocre It works by triggering a screenshot on activity, sending it to a cloud vision model for summarization, then deleting the screenshot and storing only the text in local SQLite. You query it via MCP – "what was I working on before lunch?" and Claude actually knows. |
|
Which local models did you try? GLM-OCR seems like it would excel at this: https://huggingface.co/zai-org/GLM-OCR