|
|
|
|
|
by cldellow
3689 days ago
|
|
We use it for two things: * distributed machine learning tasks using their built-in algorithms (although note that some of them, e.g. LDA, just fall over with not-even-that-big datasets) * as a general fabric for doing parallel processing, like crunching terabytes of JSON logs into Parquet files, doing random transformations of the Common Crawl As a developer, it's really convenient to spin up ~200 cores on AWS spot instances for ~$2/hr and get fast feedback as I iterate on an idea. |
|