Hacker News new | ask | show | jobs
Anchor Engine – deterministic semantic memory for LLMs, <1GB RAM runs on a phone (github.com)
1 points by BERTmackliin 105 days ago
4 comments

The inspectability angle is genuinely useful, being able to trace exactly why something was retrieved is something vector search can't offer, and the tag-receipt approach is clean for structured knowledge.

One thing I'm trying to understand: the README calls this "semantic" retrieval, but looking at the Unified Field Equation in the whitepaper, the core scoring is tag intersection with temporal decay: W(q,a) = (shared tags) × γ^(graph distance) × (recency). That's weighted keyword matching, which is deterministic precisely because it's lexical, not semantic.

The vector.ts also has MockSoulIndex as a no-op stub with a note saying dense vector search is "optional augmentation" that's currently disabled so no embeddings are running in practice.

I've been building in this space with hand-written TypeScript (no AI codegen) and the line between "semantic" and "keyword" matters a lot to users. If someone stores "the JWT conversation" they won't find it by querying "authentication."

Is the tag extraction smart enough to bridge that, or is explicit tagging on the user to handle?

@silentsvn - thank you for reading carefully enough to ask this. You're correct that the core scoring is tag‑based and deterministic, which is lexical, not "semantic" in the modern embedding sense. The terminology is worth unpacking.

We call it "semantic" in the broader sense of meaning‑bearing structure—the graph encodes relationships between concepts, and retrieval walks those relationships. But you're correct that at query time, it's matching on tags, not vector similarity.

Why not embeddings? We made a deliberate trade‑off: determinism and explainability over fuzziness. With vector search, you get a black‑box similarity score and no way to debug why something was retrieved. With tag‑based traversal, you can trace the exact path: "This result matched because it shares tags X, Y, Z and is within 2 hops of your query." That matters for agentic workflows where auditability is critical.

Tag extraction is where we do the work to bridge the lexical gap. The atomization pipeline uses: - Wink NLP for entity recognition and part‑of‑speech filtering (so "authentication" and "JWT" both get tagged with relevant concepts if they appear in context). - Co‑occurrence windows to infer relationships (e.g., if "JWT" and "authentication" repeatedly appear near each other, they get linked in the graph). - Synonym expansion (via Standard 111) so queries for "authentication" can surface nodes tagged with "JWT" if the system has learned that relationship from your corpus.

It's not magic - if you never mention "JWT" in the same context as "authentication," the graph won't connect them. But that's a feature, not a bug: the system reflects your actual usage, not a statistical average of the internet.

The trade‑off is real: you give up the fuzzy "close enough" retrieval of vectors in exchange for perfect traceability and no embedding drift. For many use cases (project memory, execution traces, personal knowledge bases), that's the right call.

I'd love to hear more about what you're building in this space. Always good to find others thinking about these trade‑offs.

Thanks for the response

The determinism trade-off is genuinely interesting — auditability over fuzziness is a real design philosophy, not just a limitation.

We've been building something that tries to avoid forcing that choice. Engram uses three strategies in parallel: vector embeddings (nomic-embed-text via Ollama, local-first), BM25 keyword, and temporal recency — merged with Reciprocal Rank Fusion. Each result comes back with an explicit similarity score and the tier it came from (working memory / long-term / archived), so the retrieval path is still traceable even when it's fuzzy.

We also layer on a graph component similar to yours — entity-relationship extraction that augments top results with connected context. The difference is that graph is additive on top of embedding retrieval rather than the primary mechanism.

The place your approach wins clearly is corpus-specific precision. If the graph is built from your actual usage (your JWT/authentication example), tag traversal will reliably surface relationships that vectors would miss or dilute with internet priors. That's a real advantage for execution traces and project memory.

Still working through the right defaults for consolidation (when to summarize old working memories vs keep them granular). Curious whether you've thought about memory aging in your model.

Repo if curious: github.com/Cartisien/engram (http://github.com/Cartisien/engram)

I built Anchor because I kept hitting the same wall: local LLMs are great, but every conversation is a fresh start. Vector search is the default hammer, but for structured memory—project decisions, entity relationships, temporal facts—it's often the wrong tool.

Live demo (in-browser, no setup): https://rsbalchii.github.io/anchor-engine-node/demo/index.ht...

Search Moby Dick or Frankenstein and see the tag-based receipts that show why each result matched.

How it works Anchor uses graph traversal (the STAR algorithm) instead of embeddings. Concepts become nodes, relationships become edges. The database stores only pointers (file paths + byte offsets); content stays on disk, so the index is small and rebuildable. PGlite (PostgreSQL in WASM) lets it run anywhere Node.js does – including a Pixel 7 in Termux, with <1GB RAM.

Performance - <200ms p95 search on a 28M-token corpus - <1GB RAM – runs on a $200 mini PC, a Raspberry Pi, or a phone - Pure JS/TS, compiled to WASM, no cloud dependencies

What’s new in v4.6 - distill: lossless compression of your corpus into a single deduplicated YAML file. I tested it on 8 months of my own chat logs: 2336 → 1268 unique lines, 1.84:1 compression, 5 minutes on a Pixel 7. - MCP server (v4.7.0) – exposes search and distillation to any MCP client (Claude Code, Cursor, Qwen tools) - Adaptive concurrency – automatic switching between sequential (mobile) and parallel (desktop) processing

The recursion I used Anchor to build itself. Every bug fix and design decision is in the graph – that's how I kept the complexity manageable.

Where it fits If you're building local agents, personal knowledge bases, or mobile assistants and want memory that's inspectable, deterministic, and lightweight – this is for you.

GitHub repo: https://github.com/RSBalchII/anchor-engine-node

Whitepaper: https://github.com/RSBalchII/anchor-engine-node/blob/main/do...

Happy to answer questions about the algorithm, the recursion, or the mobile optimizations.

I think this is going on my list of things I want to try. I have some feedback, but need to qualify it with a warning that I've barely used any AI beyond simple chat bots. This is going to be the opposite of the feedback that silentsvn gave you, meaning I have no idea what I'm talking about :-)

TLDR; You need a "how to use it" section that explains how to get information in and out of the context. That's assuming I'm not completely misunderstanding the purpose.

I started using Claude Code about a week ago, but my goal is to get something running locally that can help me get things accomplished. I'm skeptical of the claims that AI can do the work for us, but I'm interested in the idea that we can offload a bunch of cognitive load onto it freeing up brain space for the actual problems we're trying to solve. Some kind of memory system is the starting point IMO.

So here's my feedback. I skimmed the repo. You explain what it does and how it does it, but I have no idea what it does or how it does it. I think your explanations are too technical for people to understand why they'd want something like this and the example makes it look like a simple search engine. I think you need more of an explain-it-like-I'm-five approach. I might know enough to be the 5 year old in the conversation, so I'll explain a few issues I've been having and maybe you can tell me if / how your tool helps.

Most of this is in the context of using Claude Code.

I noticed the amnesia problem immediately, but expected it. I figured I'd need to take a couple of days to configure the system to remember things and adhere to my preferences, but now I realize that was wildly optimistic. Regardless, I started making a very naive system that uses markdown files with the goal of getting a better understanding of managing memory and context together. It tries to limit the current context, but it's naive. It walks a hierarchy and dumps things into the context. It's just for me to learn. I'll be happy if it helps me understand enough to pick a good tool that already exists.

The first big problem I hit was that I want what you describe as compounds, mainly chat exports, especially as I'm starting out and just want to "dump" information somewhere. I want all my chat history as I'm learning something. I had a big ah-ha moment when I asked Claude to write our conversation to a markdown file and it told me it couldn't, but offered to output a summary. I'm losing information in real time as I chat. I don't know if it's valuable or not because I don't know enough to know what I don't know.

I've been getting the most value from chatting with the AI to learn and plan things. That involves a lot of ideas, right or wrong, and I want to be able to save and retrieve those chats verbatim so I can get back to the exact same context in the future. I don't know if that's a good or bad idea, but I figure that, if I can retrieve the original context, I can always have the AI summarize it or have it help me create something more well structured once I understand the topic a bit better. I also think there's probably some value in having a future model re-evaluate that old context. For example, in the future I can start it with the current refined context (how I implemented things) and have it walk through all that old context to see if there are any novel ideas that might help to solve existing issues.

I'm assuming your spec documents are followed by the AI when working on the project. Is that right? If so, I wonder if you're underselling that by not giving an ELI5 example of how that works. For me, that's a hard problem to solve. I want a semantic search for rules the model needs to apply but I don't really want it to be semantic because they're rules that must be applied. I need to be able to ask "why isn't the tool following my docker compose spec" and need a deterministic way to answer that. I think your project does that.

Maybe I'm simply lacking knowledge and should be able to understand why I need this kind of tool and, more importantly, how it maps to context management (assuming that's what it does).

I'll give you an analogy, at least that applies to me. Your "how it works" section is like going to driver training and having the instructor start explaining how the car's engine and transmission are built. People like me need it dumbed down; "Push the gas and turn the wheel. It's faster than your bicycle."

Maybe I'm not the target audience yet, but maybe I am. I'm already convinced that AI with good memory management is useful. I'm also unwilling to build that memory using a commercial system like Claude or ChatGPT. It's vendor lock-in on the level of getting a lobotomy if you lose access to that system and I don't think people are doing a good job of assessing that risk.

I'm going to finish building my own crappy memory system and then yours is going to be the first real system I try. Thanks for sharing it.