|
Hey HN,I'm Al-ameen, been running OpenClaw as my personal agent over Telegram/WhatsApp for a few weeks now. It's incredibly powerful for proactive stuff (calendar, emails, code tasks), but the token burn is brutal — heartbeats alone were eating $50–100/mo on Claude Opus before I started optimizing.OpenClaw's native memory (SOUL.md, session history) works for short sessions but forgets across restarts or long threads, leading to repetitive prompting and higher costs. So I started hooking it up to Whisper (a context/memory API with hybrid search, knowledge graphs, and delta compression) via custom skills.It's still hacky — no native integration, just a Whisper skill that queries compressed context/memories before feeding prompts to the agent. But early results align with what others report: noticeably fewer tokens in multi-turn convos, especially when combining with cheap model routing (Gemini Flash / DeepSeek for heartbeats).What I did (basics)Baseline tweaks first (these alone helped a lot):In ~/.openclaw/openclaw.json:json {
"agents": {
"defaults": {
"model": "gemini/flash-lite",
"fallbacks": ["anthropic/claude-3-opus"],
"contextTokens": 150000
},
"heartbeat": {
"model": "deepseek/v3.2",
"interval": "45m"
}
}
} → Switched trivial stuff away from Opus.
Whisper skill setupCreated ~/.openclaw/workspace/skills/whisper/:SKILL.md # Whisper Context Fetch
Pulls delta-compressed context / memories to reduce prompt size. @whisper get "summarize auth changes in repo" --project openclaw-exp index.ts (simplified)ts import { Whisper } from '@usewhisper/sdk'; const whisper = new Whisper({
apiKey: process.env.WHISPER_API_KEY,
orgId: process.env.WHISPER_ORG_ID
}); export async function getContext(args: { query: string; project: string }) {
const res = await whisper.query({
project: args.project || 'openclaw-exp',
query: args.query,
compress: true,
compression_strategy: 'delta',
include_memories: true
});
return {
context: res.context,
memories: res.memories?.map(m => `${m.type}: ${m.content}`) || [],
meta: res.meta
};
} Then in agent instructions (AGENTS.md or prompt templates): "If context/history is large, call @whisper
get first and prepend the result."
Wrapper example (in a custom loop or cron)ts async function runAgentWithContext(msg: string) {
let enhanced = msg;
if (/* detect long context needed */) {
const wc = await getContext({ query: msg });
enhanced = `Context diff: ${wc.context}\nMemories:\n${wc.memories.join('\n')}\n\n${msg}`;
}
return openclaw.run(enhanced);
} Observations so farDelta compression seems to genuinely cut repeat context (e.g., codebase overviews drop from 10–15k → 1–3k tokens after first message).
Memory types (preferences, events, goals) help avoid re-explaining user prefs.
Combined with routing, my rough tracking shows ~60–75% lower usage on mixed workloads (not 97% like some YouTube claims, but meaningful).
Latency added (~200–500ms per query) but worth it for longer sessions.
Still manual; over-indexing sources bloats Whisper project if not careful. Caveats: This is custom / early. Whisper costs $20/mo after trial. OpenClaw itself has had security noise lately (skills gone rogue, etc.), so sandbox everything. Not production-ready for high-stakes use.Curious from folks running OpenClaw:What's your biggest token sink (heartbeats, tools, long histories)?
Tried context layers / vector DBs? Actual savings numbers?
Better cheap fallbacks than Flash/DeepSeek for non-reasoning tasks? Happy to share more snippets or debug if anyone's experimenting similarly. |