|
|
|
|
|
by xxbondsxx
3874 days ago
|
|
Hive and Hadoop are offline -- it can take ~45 minutes to execute a query on our entire user table (even longer if it involves joins) and certain times of the day its slower (during work hours usually). Not only that, but once the query executes some engineer has to go copy and paste into a script that would likely run on one machine. Doing this in a distributed async job fashion allowed for a lot more flexibility. Even better, we can even change the geographic area as the algorithm runs and those changes are reflected immediately. |
|