|
|
|
|
|
by deshpand
1708 days ago
|
|
Spark may be a mature solution for truly big data, in a SQL like fashion, 1TB and more. But I constantly see it being misused, even with datasets as small as 5GB. Maybe the valuation of the company reflects this 'growth' and 'adoption'. And data locality is a thing. You can't read terabytes from object storage (over http). The batch oriented, map reduce is not going to be conducive to too many ML algorithms where state needs to be passed around. |
|
I witness frequent desire from engineers to use it because they see it as a competency/expertise that will unlock jobs at bigger, more lucrative companies. Also, startups kind of beg for it because the business keeps asking, "Will this tech scale 100x?" If you ask for a solution that scales 100x, and your problems aren't well-defined yet, and by the way it would be nice if it does streaming, too, since we might need that someday, your engineers are going to err on the side of using a big, complete solution.