|
|
|
|
|
by bxji
1786 days ago
|
|
Scale is certainly a part of it. I work in one of the data platform teams at a social media company. Between our 3 HDFS clusters, we're storing more than an exabyte of data. At our scale, we have to tune our workloads carefully to make sure that problems of scale are not noticeable to internal customers (data scientists, analysts, etc.). We basically have an entire org of highly paid engineers focused on making sure people can use that data efficiently. So we have a team of people working on storage, on Spark, on Presto/Trino, on data ingestion, and so on. So my understanding is that we're investing in engineers to improve data science productivity, so that they can do analysis without having to understand the internals of all our systems, so that executives can make informed decisions backed by data to continue printing money. Or something like that... |
|
Maybe you make it a SaaS so companies only need to hire daa scientists if you can optimize the ETL process.