Hacker News new | ask | show | jobs
by mumblemumble 2418 days ago
Divide that file size by 10, and you're still in the range where I've had to argue that rolling out Spark is overkill.
1 comments

People will disbelieve that, but it's absolutely true... I'll never forget the interview I did with an engineer who described an elaborate Hadoop-based solution to some past problem. When I asked him what type of data he was working with, he said, "Here, I'll show you," then whipped out his laptop and showed me a spreadsheet. It wasn't an extract of the data. It was literally a spreadsheet, manageable on a laptop, that he somehow decided needed a Hadoop cluster to process. (Also, who shows data from your current employer to a new prospective employer? Weird but true.)
I had an interesting experience a while back where it came to light that I was working on the same problem as another team in the org (this was a huge multinational), so a meeting was arranged so we could compare notes. The other team was slightly shocked to see that I could train a model in a minute or two, where it took them an hour or two using essentially the same algorithm.

They insisted that shouldn't be, because I was doing it on my laptop and they were using a high performance computing cluster. They of course wanted to know how my implementation could be so much faster despite running on only a single machine. I didn't have the heart to suggest that maybe it was because, not despite.

Ironically, I also got the implementation done in a lot fewer person-hours. I just did a straight code-up of the algorithm in the paper, where they had to do a bunch of extra work to figure out how to adapt it to scale-out.

This isn't to say that big data doesn't happen. Just that it's a bit like sex in high school: People talk about it a lot more than they actually have it, perhaps because everyone's afraid their friends will find out they don't have it.

What could be the reasons he wanted to use a cluster to process that spreadsheet? Just curious what was the problem they had.