Hacker News new | ask | show | jobs
by politelemon 1043 days ago
I've noticed that too. I think the marketing is definitely working, I'm seeing a few organisations starting to shift more and more workloads onto them, and some are also publishing datasets on their marketplace.

One of their most interesting offerings coming up is Snowpark which lets you run a Python function as a UDF, within Snowflake. This way you don't have to transfer data around everywhere, just run it as part of your normal SQL statements. It's also possible to pickle a function and send it over... so conceivably one could train a data science model and run that as part of a SQL statement. This could get very interesting.

3 comments

In theory, fine. Then you look at the walled garden that is Snowpark - only "approved" python libraries are allowed there. It will be a very constrictive set of models you can train, and very constrictive feature engineering in Python. And, wait, aren't Python UDFs super-slow (GIL) - what about Pandas UDFs (wait that's PySpark.....)
Having worked with a team using Snowpark, there are a couple things that bother me about it as a platform. For example, it only supported Python 3.8 until 3.9/10 recently entered preview mode. It feels a bit like a rushed project designed to compete with Databricks/Spark at the bullet point level, but not quite at the same quality level.

But that's fine! It has only existed for around a year in public preview, and appears to be improving quickly. My issue was with how aggressively Snowflake sales tried to push it as a production-ready ML platform. Whenever I asked questions about version control/CI, model versioning/ops, package managers, etc. the sales engineers and data scientists consistently oversold the product.

Yeah it's definitely not ready for modelling. It's pretty rocking for ETL though, and much easier to test and abstract than regular SQL. Granted it's a PySpark clone but our data is already in Snowflake.
Disclaimer: Snowflake employee here. You can add any Python library you want - as long as its dependencies are also 100% Python. Takes about a minute: pip install the package, zip it up, upload it to an internal Snowflake stage, then reference it in the IMPORTS=() directive in your Python. I did this with pydicom just the other day - worked a treat. So yes, not the depth and breadth of the entire Python ecosystem, but 1500+ native packages/versions on the Anaconda repo, plus this technique? Hardly a "walled garden".
Good luck with trying to install any non-trivial python library this way. And with AI moving so fast, do you think people will accept that they can't use the libraries they need, because you haven't approved them yet?!?
> run a Python function as a UDF

Is that a differentiator? I'm unfamiliar with Snowpark's actual implementation but know SQL Server introduced Python/R in engine in 2016? something like that.