Hacker News new | ask | show | jobs
by SkyPuncher 19 days ago
How does your technical approach actually create accurate fact extract?

You loose sooooooo much meaningful context and information when you transform something into a knowledge graph. Simple cases like "Gabe is CEO of Valve" map nicely to a graph, but things like "Matt Garman is CEO of AWS" don't represent that AWS is a sub-company of Amazon (with it's own CEO).

Additionally, one of my biggest gripes of Claude's memories and every memory system I've worked with is they completely fail to capture intent. The architecture notes I documented while doing a wild spike on a critical infrastructure component absolutely should not be referenced in every day work. Yet, somehow, that type of memory always works it's way into unrelated sessions.

1 comments

In your specific example, we can use on external tool calls to supplement limited information. If the memory system isn't rich enough to get a good answer, we can always call back into the original data source for more information. Extraction is good when it's eager (extract more than you think you need) and rely on deduplication + graph orchestration mechanics to keep the graph size bounded.

There is also a capture problem. Imagine you hire an intern and you tell them "John Smith is the CEO of Foo". If they've never heard of Foo, it would be impossible to infer anything about the nature of Foo, unless they're allowed to look into the outside world. No system (even humans!) can capture 100% of information, but that doesn't mean the system is broken. The question is, can you organize and collect enough information to be able to (a) address most queries and (b) initiate deeper investigation if the information is incomplete? We believe the answer is yes.

Intent is very much the same way. Will hybrid search uncover your architecture notes at some point, for an unrelated reason? Almost certainly. Should there be enough surrounding context to indicate that this was written for a spike? Also yes (this is where Claude/markdown memories fail). It should be enough to still be (net) massively useful, and the error rate will go down over time.