|
|
|
|
|
by dajohnson89
3410 days ago
|
|
I'm not sure why you got downvoted. It's a valid point and it makes intuitive sense. Is there a clear cut answer, as to whether one should choose a distributed solution or not? It seems to me that if you're at the Terabyte scale, choosing non-distributing seems to be asking for trouble. A quick search indicates the largest HDD you can buy is around 8TB. |
|
It means that the size of the dataset is not the only factor, you need to take into account the operations performed on each "element/document", the size of the intermediate datasets and the size of the final results and some more stuff (encoding, etc.).