Hacker News new | ask | show | jobs
by gbro3n 64 days ago
I've used open claw (just for learning, I agree with the author it's not reliable enough to do anything useful) but also have a similar daily summary routine which is a basic gemini api call to a personal mcp server that has access to my email, calendar etc. The latter is so much more reliable. Open claw flows sometimes nail it, and then the next day fails miserably. It seems like we need a way to 'bank' the correct behaviours - like 'do it like you did it on Monday'. I feel that for any high percentage reliability, we will end up moving towards using LLMs as glue with as much of the actual work as possible being handed off to MCP or persisted routine code. The best use case for LLMs currently is writing code, because once it's written, tested and committed, it's useful for the long term. If we had to generate the same code on the fly for every run, there's no way it would ever work reliably. If we extrapolate that idea, I think it helps to see what we can and can't expect from AI.
2 comments

This is interesting. I haven't used OpenClaw but I set up my own autonomous agent using Codex + ChatGPT Plus + systemd + normal UNIX email and user account infrastructure. And it's been working great! I'm very happy with it. It's been doing all kinds of tasks for me, effectively as an employee of my company.

I haven't seen any issues with memory so far. Using one long rolling context window, a diary and a markdown wiki folder seems sufficient to have it do stuff well. It's early days still and I might still encounter issues as I demand more, but I might just create a second or third bot and treat them as 'specialists' as I would with employees.

I did (using Claude Code) something that sounds very similar to this. It’s a bunch of bootstrapped Unix tools, systemd units, and some markdown files. Two comments:

- I suspect that in this moment, cobbling together your own simple version of a “claw-alike” is far more likely to be productive than a “real” claw. These are still pretty complex systems! And if you don’t have good mental models of what they’re doing under the hood and why, they’re very likely to fail in surprising, infuriating, or downright dangerous ways.

For example, I have implemented my own “sleep” context compaction process and while I’m certain there are objectively better implementations of it than mine… My one is legible to me and therefore I can predict with some accuracy how my productivity tamagotchi will behave day-to-day in a way that I could not if I wasn’t involved in creating it.

(Nb I expect this is a temporary state of affairs while the quality gap between homemade and “professional” just isn’t that big)

- I do use mine as a personal assistant, and I think there is a lot of potential value in this category for people like me with ADD-style brains. For whatever reason, explaining in some detail how a task should be done is often much easier for me than just doing the task (even if, objectively, there’s equal or higher effort required for the former). It therefore doesn’t do anything I _couldn’t_ do myself. But it does do stuff I _wouldn’t_ do on my own.

Right - I think email is a much better UI than Slack or WhatsApp or Discord for that reason. It forces you to write properly and explain what you want, instead of firing off a quick chat. Writing things down helps you think. And because coding harnesses like Codex are very good at interacting with their UNIX environments but are also kinda slow, email's higher latency expectations are a better fit for the underlying technology.
Any chance you might put this on GH? Sounds really interesting.
Maybe but it's so simple I'm not sure it's worth it. You can easily make your own!
What sort of tasks do you have it do for you?
Two categories: actual useful work for the company, and improving the bot's own infrastructure.

Useful work includes: bug triage, matching up external user bug reports on GitHub to the internal YouTrack, fixing easy looking bugs, working on a redesign of the website. I also want to extend it to handling the quarterly accounting, which is already largely automated with AI but I still need to run the scripts myself, preparing answers to support queries, and more work on bug fixing+features. It has access to the bug tracker, internal git and CI system as if it were an employee and uses all of those quite successfully.

Meta-work has so far included: making a console so I can watch what it's doing when it wakes up, regularly organizing its own notes and home directory, improving the wakeup rhythm, and packaging up its infrastructure to a repeatable install script so I can create more of them. I work with a charity in the UK whose owner has expressed interest in an OpenClaw but I warned him off because of all the horror stories. If this experiment continues to work out I might create some more agents for people like him.

I'm not sure it's super useful for individuals. I haven't felt any great need to treat it as a personal assistant yet. ChatGPT web UI works fine for most day to day stuff in my personal life. It's very much acting like an extra employee would at a software company, not a personal secretary or anything like that.

It sounds like our experience differs because you wanted something more controlled with access to your own personal information like email, etc, whereas I gave "Axiom" (it chose its own name) its own accounts and keep it strictly separated from mine. Also, so far I haven't given it many regular repeating tasks beyond a nightly wakeup to maintain its own home directory. I can imagine that for e.g. the accounting work we'd need to do some meta-work first on a calendar integration so it doesn't forget.

I’m doing this exact same thing in my solo saas company, except with Cursor’s Cloud Agents. I can kick them off from web, slack, linear, or on a scheduled basis, so I’m doing a lot of the same things as you. It’s just prompts on a cron, with access to some tools and skills, but super useful.
That unreliability was why I gave up on OpenClaw. I tried hard to give it very simple tasks but it had a high degree of failure. Heartbeats and RAG are lightyears away from where they need to be. I'm not sure if this can be overcome using an application layer right now, but I trust that many people are trying, and I'm eager to see what emerges in the next year. In the mean time I know that they're working very hard on continuous learning - real-time updates to weights and parametric knowledge. It could be that in a year or so, we can all have customised models.
That would be great if that comes to fruition. Investing in a model with weights updates would be like investing in employee training, rather than just giving the same unreliable employee more and more specific instructions.