|
|
|
|
|
by FractalNerve
3190 days ago
|
|
> How else could you solve what Vespa does using Rust, Go, or C/C++ libraries? Let me try myself answering my own question, I hope someone hops in and tells me where I'm wrong or how else to improve :) 1) Get PostgresSQL exntensions via "package manager" pgxnclient
1.1) pg_bouncer - For connetion pooling
1.2) yoke - As a high-availability cluster manager with auto-failover and automated cluster recovery
1.3) prestodb.io - Distributed SQL query engine for pgsql
1.4) pglogical - Logical streaming replication for using a publish/subscribe model
1.5) pg_lambda - To create your own AWS (meta) Lambda
1.6) pg_strom - To offload tasks to the GPU
1.7) zombodb - To utilize full-text searching via indexes backed by Elasticsearch
2) Put all together with pglogical and presto to seperate GPU/CPU intensive tasks.
2.1) "Build Missing Middleware" - To design/fuse a query visually that combines multiple backends
2.1.1) Create a binary data-stream by integrating pg_lambda, pg_strom, presto and zombodb
2.1.2) "Build Missing Middleware" - A tensor processing extension to use ML Model evaluations
2.1.3) "Use Missing Middleware" - For data-processing via Machine-Learning models
2.1.4) "Use Missing Middleware"- To output ML processed results into the database
2.2) Partition these queries using "pg_lambda + middleware" to create accelerated and fused query results
So what's missing to create a Vespa alternative using existing technologies is everything in Point 2) if I'm not mistaken. Torrent based replication isn't exactly neccessary, except at Twitter/Facebook scale, but if you reach that stage you can hire a libtorrent author. |
|
It would have following properties: decentralized, distributed, resilient, highly-available, software-defined storage & retrieval system.
According to http://vespa.ai/#featurematrix:
And yet I've to admit that even if the Github repository looks quite chaotic, making an alternative, even using existing technologies would be big feat.Initially I would've chosen PostgresSQL as a base, but the "HA-Layer" is something that shouldn't be decoupled and not a later thought. That's why CAS is a much better form of integration. Also integrating the PostgresSQL Engine into a zfs kernel extension ie. would be a mess. And integrating the database engine into a a distributed p2p algorithm would only add compatability issues an no real advantages.
[1] https://en.wikipedia.org/wiki/Content-addressable_storage#Op...
PS: Clever aquisition by Docker! "Infinit.sh is a content-addressable and decentralized (peer-to-peer) storage platform that was acquired by Docker Inc." And in my eyes one of the best implementations and easiest targets that allow adding a database-layer ontop.