| HN Mirror

1) Get PostgresSQL exntensions via "package manager" pgxnclient 1.1) pg_bouncer - For connetion pooling 1.2) yoke - As a high-availability cluster manager with auto-failover and automated cluster recovery 1.3) prestodb.io - Distributed SQL query engine for pgsql 1.4) pglogical - Logical streaming replication for using a publish/subscribe model 1.5) pg_lambda - To create your own AWS (meta) Lambda 1.6) pg_strom - To offload tasks to the GPU 1.7) zombodb - To utilize full-text searching via indexes backed by Elasticsearch 2) Put all together with pglogical and presto to seperate GPU/CPU intensive tasks. 2.1) "Build Missing Middleware" - To design/fuse a query visually that combines multiple backends 2.1.1) Create a binary data-stream by integrating pg_lambda, pg_strom, presto and zombodb 2.1.2) "Build Missing Middleware" - A tensor processing extension to use ML Model evaluations 2.1.3) "Use Missing Middleware" - For data-processing via Machine-Learning models 2.1.4) "Use Missing Middleware"- To output ML processed results into the database 2.2) Partition these queries using "pg_lambda + middleware" to create accelerated and fused query results

I thik basing this on PostgresSQL was wrong now and believe that a meaningful approach at creating a Vespa alternative yourself is basing this on a Content-Adressable-Storage[1] and adding a DB-Layer ontop (ie. using AUFS).

It would have following properties: decentralized, distributed, resilient, highly-available, software-defined storage & retrieval system.

According to http://vespa.ai/#featurematrix:

        FEATURE	                    VESPA	ELASTIC SEARCH	RELATIONAL DATABASES
        ACID transactions			                •••
        Optimized for analytics		        •••	        ••
        Optimized for serving	    •••	        •	        ••
        Scalable	            •••	        ••	        •
        Easy to operate at scale    ••	                        •
        Text search	            •••	        ••	        •
        Machine learned ranking	    •••	        •               2.1.2) - 2.1.4)	
        Middleware logic container  •••		                1.4)
        Live reconfiguration	    •••	                        1.2)

And yet I've to admit that even if the Github repository looks quite chaotic, making an alternative, even using existing technologies would be big feat.

Initially I would've chosen PostgresSQL as a base, but the "HA-Layer" is something that shouldn't be decoupled and not a later thought. That's why CAS is a much better form of integration. Also integrating the PostgresSQL Engine into a zfs kernel extension ie. would be a mess. And integrating the database engine into a a distributed p2p algorithm would only add compatability issues an no real advantages.

[1] https://en.wikipedia.org/wiki/Content-addressable_storage#Op...

PS: Clever aquisition by Docker! "Infinit.sh is a content-addressable and decentralized (peer-to-peer) storage platform that was acquired by Docker Inc." And in my eyes one of the best implementations and easiest targets that allow adding a database-layer ontop.