|
|
|
|
|
by dublin
2719 days ago
|
|
Last year, I took a contract leading a dev team at a top company here in Austin. The team, largely educated in India, said doing substantial ETL-like cleanup of the pipe-separated 15GB input file would take then about 3 days: 1 day to write the Scala & Java code, and 2 days to process it through the Spark cluster. I shocked them when I called B.S.! I spent the rest of the afternoon showing what you can do with pipelines of grep, cut, paste, and awk. Took 2-3 hours to build the proper pipeline and only 15 minutes to run it on my local laptop hard disk. The sad thing is they were impressed, but still inclined to use the ridiculously complicated cluster pipelines instead, since that was "the way we were taught..." |
|
It's an unknown unknown, a lot of programmers will simply not know you can use quick and dirty scripts to process data if you're only going to do it once.
I work mainly in .Net and the similar problem I see is devs who don't even know that console apps exist, let alone how to make them, which simplifies prototyping new code immensely.