|
|
|
|
|
by riku_iki
2428 days ago
|
|
> hell not suitable to fan out ML workloads depends on the scale? Not everyone processes petabytes of data. > PG might play the role of the slow You have any benchmark in your hand to support this? I believe highly optimized C code in PG can be significantly faster than Scala inside Spark. |
|
There's no question about this. If you can express your task in terms of PG on a single instance, then you probably should.
When you get to more complex tasks, like running input through GloVe and pushing ngrams to a temporal store, PG offers very little - which is fine, it's not at all what PG is designed for. Inter-node IO eclipses single node perf, which is why Spark is used despite being a terribly inefficient thing (although in the case of Spark, it's so inefficient that for interim sized workloads you'd actually be better off vertically scaling a single node and using something else). PG won't help at all with these tasks.
Also, that smorgasbord of extensions GP listed isn't offered by any cloud vendor as a managed service afaik, meaning you must roll and manage your own. Depending on your needs, that might be a show stopper.