Hacker News new | ask | show | jobs
by knn 3952 days ago
The databricks platform should solve exactly your problem - reusable data pipelining/transformation. I saw a demo of it last night and it was extremely slick. Their product is amazing, it makes data pipelining incredibly easy compared to setting up a hadoop cluster and running hive/etc. (I don't work for them - but if any databricks employee sees this, please hire me!) It runs on a spark cluster over AWS, which is much more modern and powerful than SAS/excel/sql. Since you know how to program already, it shouldn't be too hard to pick up spark (even has python bindings)