Hacker News new | ask | show | jobs
by canadiantim 1157 days ago
Would using Cozo make sense for a social network? Certainly would make modelling comments easier but how well does it handle many concurrent users?

Can you intermix different storage engines as well? So e.g. a user could have a personal storage using sqlite but also easily save to a rocksDB storage as well?

In regards to the timetravel capabilities, can this be leveraged to implement git-like features querying these historical points in time in the data?

Also just curious your thoughts on how secure data is within Cozo? Or asked another way, how production-ready is Cozo. I know it's still early days but could Cozo be used as the primary database in a product being delivered today?

Great work all around, really awesome to see!

2 comments

Great questions!

- As can be seen https://docs.cozodb.org/en/latest/releases/v0.3.html, for concurrent writes about 200K QPS can be achieved with 24 threads on a pretty old server. I think it is enough for a small to medium social network.

- You can start independent instances and use them together in your user code. You can have as many as you like, but data can only be exchanged through your code: they can't talk directly to each other.

- If by git-like you mean point-in-time queries, yes that's what the feature is for. But git comes with lots of other things such as merge logic, etc. These need to be implemented outside CozoDB.

- We do use CozoDB for data storage in production systems ourselves, and we back up a lot. So far nothing disastrous has happened. Note that CozoDB does not have any meaningful concept of user/authentication/authorization (yet), so you must make sure that only trusted clients can reach it (only an issue if you use the standalone server, since the embedded DBs do not open any ports).

As a graph “fanboy” I’m impressed, humbled, and inspired by the work that’s been done already and the direction you’re heading!

> Note that CozoDB does not have any meaningful concept of user/authentication/authorization (yet)

Please please please implement the Palantir security model unless you already have a smarter idea coming down the pipe. Palantir regularly scrubs past media from the internet, but there is a blog post that has the ACL slides from the now-private video: https://onetwo.ren/级联GraphQL访问控制/

Did some digging, I found this: https://documents.pub/document/palantir-access-control.html which appears to be the full slideshow
Perfect yes, thank you.
Could this be used for creating a memory system, with weights, and the ability to rewind thought chains? Would you be interested in partnering up ? I'm not a database dev, but I have some great ideas, and I'm already reaching out to investors to build something in AI, and have a partner potentially. I'd love to build a database that is basically like the midbrain of AI, a database hybrid that is built specifically for AI memories, and memory relations. If you're open to collaborating and building a product, perhaps my ideas could be a good 'test case' and be mutually beneficial to all of us. email : patrickwcurl - gmail.
Amazing! Thank you, all very encouraging answers. Congrats on everything you've achieved with Cozo so far!!

One last question if possible. Is there a recommended way to do Full Text Search on data stored in Cozo?

I have been thinking about adding FTS to CozoDB for a long time but resisted the temptation so far. The reason is that text search is language-specific: what works for one language does not work for another. There is simply no way that CozoDB can duplicate the work of a dedicated text search engine for all the languages in the world.

Our current solution is to use mutation callbacks to synchronize texts to a dedicated text search engine. This is language specific: for example, for python: https://github.com/cozodb/pycozo#mutation-callbacks , and for Rust: https://docs.rs/cozo/latest/cozo/struct.Db.html#method.regis...

Sonic [1] might be a good fit, though it is not yet factored into a separate library [2].

[1]: https://github.com/valeriansaliou/sonic

[2]: https://github.com/valeriansaliou/sonic/issues/150

Thank you, that makes sense. Plus with vector search there seems to be ways of shoehorning FTS with it. Could also potentially use sqlite storage and piggyback off SQlite FTS5 but not sure how well that setup could work
What about branching?
You’ve itemized almost my entire wish list.

In terms of “timetravel”, I want to see exactly what an item was at a specific time (COW with metadata works decently, but I’d love graph snapshots/diffs)

And one more thing: 20% Parity data for everything that’s in the system, stored in a way that it can be verified at-rest and can also be exported then verified locally.

Yes, I know filesystems are great at reliability now but safely transferring between systems is beyond their scope