|
|
|
|
|
by RyanHamilton
1916 days ago
|
|
If you can't compute where the data is, you will end up having to pull back all the data to calculate against it. Assuming the cost of transfer for the full data set is >50% compared to performing at least some calculations that reduce the size, it's worth it. |
|
Many orgs these days store all data in data lake shared-disk architectures and pull down the subsets. The performance hit of pulling down data over high bandwidth channel such as s3 - ec2 is much more reasonable to companies than storing everything on expensive compute instances just so that the "data would be there" ready for querying if somebody ever needs it.