|
|
|
|
|
by adgjlsfhk1
1262 days ago
|
|
imo this is wrong in a few ways. firstly your data fits in a computer. you can get a computer with a petabyte of storage if you need to, and spark is slow enough that doing it in a single computer will probably be faster. also while a computer with 1pb of storage is expensive, it's less expensive than splitting your data up in terms of hardware, maintenance, and software dev time costs. secondly, your data probably fits in RAM if you actually try. your can get a computer with 60TB of RAM which is an awful lot of data. |
|