|
|
|
|
|
by truth_seeker
2429 days ago
|
|
Say for an example, I am using PostgreSQL 12 + CitusDB extension Data cleaning -> PL/SQL and various inbuilt functions for the transformation of data (or new UDF if required at all) Processing -> PostgreSQL Parallel processing on the local node and Citus DB extension for distributed computing and sharding Analytics -> Many options here. Materialized views OR Triggers OR Streaming computation with PipelineDB extension OR Using Logical replication for stream computation ML -> PG support variety of statistics functions. It also supports PL/R and PL/Python extension to interface with ML libraries. Also, there are various kinds of Foreign Data Wrappers supported by PG - https://wiki.postgresql.org/wiki/Foreign_data_wrappers |
|
PG is great but it's not suitable to be a feature store and sure as hell not suitable to fan out ML workloads. In a modern ML stack, PG might play the role of the slow but reliable master store that the rest of the ML pipeline feeds off.