|
|
|
|
|
by sorobahn
103 days ago
|
|
Am working on making this layer currently. It’s a more interesting problem even when you remove AI agents from the picture, I feel a context layer can be equally as useful for humans and deterministic programs. I view it as a data structure sitting on top of your entire domain and this data structure’s query interface plus some basic tools should be enough to bootstrap non trivial agents imo. I think the data structure that is best suited for this problem is a graph and the different types of data represented as graphs. Stitching api calls is analogous to representing relationships between entities and that’s ultimately why I think graph databases have a chance in this space. As any domain grows, the relationships usually grow at a higher rate than the nodes so you want a query language that is optimal for traveling relationships between things. This is where a pattern matching approach provided by ISO GQL inspired by Cypher is more token efficient compared to SQL. The problem is that our foundation models have seen way way way more SQL so there is a training gap, but I would bet if the training data was equally abundant we’d see better performance on Cypher vs SQL. I know there is GraphRAG and hybrid approaches involving vector embeddings and graph embeddings, but maybe we also need to reduce API calls down to semantic graph queries on their respective domains so we just have one giant graph we can scavenge for context. |
|
The insight about relationships growing faster than nodes is spot on, and it's why we think the graph model is the natural fit for context layers. But in practice, you also need documents, vectors, and sometimes time-series data alongside the graph. Forcing everything into a single model (or stitching together multiple databases) creates friction that kills agent workflows.
On the GQL/Cypher vs SQL point — agreed on token efficiency. We support both SQL (extended with graph capabilities) and Cypher-style syntax, and the difference in prompt size for traversal queries is dramatic. An N-hop relationship query that takes 5+ lines of SQL JOINs is a single readable line in a graph query language. For LLM-generated queries, that's not just an aesthetic win — it directly reduces error rates and token costs.
Re: GraphRAG — we've seen the same convergence. Vector similarity to find the right neighborhood, then graph traversal for structured context. Having both in one engine (ArcadeDB supports vector indexing natively) means you avoid the API orchestration overhead you mention. One query, one database, full context.
The training gap for graph query languages is real but closing fast. As more agent frameworks adopt graph-based context, the flywheel will kick in.