Hacker News new | ask | show | jobs
by arvinaminpour 2408 days ago
I'm always surprised at how people utilize Jupyter notebooks. What kinds of platforms/software do these firms use to perform the analysis?
2 comments

It’s a bit of a mess right now. AWS is eating this market but frankly their products are not that great. Their ETL tool which does parallel execution and all is called Glue which is a cloud version of Spark. Glue is supposed to integrate with SageMaker which is basically your standard jupyter notebook experience. Spark though not that intuitive and is not the tool data scientists use for exploration. So data scientists explore and build model and then they rebuild them to run in Spark. Basically we would need a way to seamlessly scale pandas or R dataframes across clusters. Dask looks promising but it is facing an uphill battle vs aws and company and their inferior but convenient tooling.
A friend of mine is trying to build the databricks of dask for exactly that reason.
Proprietary software. This is mostly stuff used for options procing. Every options trading house has a substantial investment in proprietary options pricing software, mostly using customized adaptations of these models rather than what you would find in the literature.