| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by marcinzm 1984 days ago

>UDF support isn't really the same, to be honest. You're still prisoner of the select from pattern. Don't get me wrong, SQL is wonderful where it works, but it doesn't work for everything that I need.

Not sure how it's different from what you can do in Spark in terms of data transformations. Taking a list of objects as an argument basically allows your UDF to do arbitrary computations on tabular data.

> I forgot about Xgboost, but I'm a big fan of unsupervised methods (as input to supervised methods, mostly) and Spark has a bunch of these.

That's true, distributed unsupervised methods aren't done in most other places I know of. I'm guessing there's ways to do that with neural network although I haven't looked into it. The datasets I deal with have structure in them between events even if they're unlabeled.

>I completely agree that it's faster than Spark, but it's also super-expensive and more limited. I suspect it would probably be cheaper to run a managed Spark cluster vs Snowflake and just eat the performance hit by scaling up.

I used to do that on AWS. For our use case, Athena ate its lunch in terms of performance, latency and cost by an order of magnitude. Snowflake is priced based on demand so I suspect it'd do likewise.

1 comments

tormeh 1984 days ago

Spark has a superset of the functionality Athena has. Athena is faster, but it's also very limited. They're not designed to do the same thing.

link