|
|
|
|
|
by angrybits
3979 days ago
|
|
> The one big vertically scaled database at the hub of everything is definitely disappearing. Maybe in Startupville, CA. But I think you're forgetting that it's a big world out there and there are lots of systems that are built on vertically scaled relational engines that practically print money. The vast majority of companies out there do not have Twitter-scale engineering problems to solve. |
|
And the big EDW that you use to find powering everything has been broken over the years into unintegrated silos e.g. ERP, Web, Salesforce, Payroll etc. The big trend now is to reintegrate all this data and do analytics on it. To do this requires you to do (a) major ETL work between completely different schemas then (b) your data science/analytics work. In semi real time.
This article is referring to this type of workload since this is Spark's bread/butter. You land the data in HDFS, use Spark SQL to run ETL/Analytics jobs and then output the results in a single enterprise view for reporting, marketing etc. And yes this is identical to what Twitter's analytics team would be doing.
With cloud tools from Azure, IBM, Amazon this sort of analytics is going to be becoming much more common place. All using SQL the language but not SQL the database.