Hacker News new | ask | show | jobs
by dfah 2139 days ago
Data analysis is alas a big field. I would say you should assess a few factors: 1) how much basic-ish learning in the data science / statistics / ML areas you expect to be undergoing yourself; the Python ecosystem will probably make this much faster 2) how you expect to scale & productionize your analysis tasks (if at all); in my experience Python is a second-class citizen in the Spark world, not far above Clojure's third-class status, and throughput gains from Clojure's native JVM output may outweigh the relative convenience of the PySpark interface. TensorFlow & TFX's interfaces are basically designed from the ground-up for Python. 3) Which major techniques & corresponding libraries you expect to use (e.g. MCMC/STAN, Pyro, TensorFlow, scipy, scikit-learn). Some of these might rule out one language (more likely eliminating Clojure) or the other. 4) How important data visualization will be for you. This aspect of the work will be much easier & richer in Python than in Clojure. 5) What kind of data transformation & validation you expect to do. If this is largely statistical in nature (e.g. rescaling distributions) it's probably a wash. If this is viz-heavy it'd favor Python. If this involves complicated structured data, I'd recommend Clojure.