Hacker News new | ask | show | jobs
by tracker1 2719 days ago
Last time I was involved in hiring I created a very simple coding challenge. I've seen far worse on the other side. The goal was to read in a CSV file, read and transform it to a different format (newline separated json) node.js preferred and put it on github. There were no restrictions on what libraries could be used, no requirements for unit tests, etc. It's something that literally took me about half an hour to write at the time. Only one candidate had a working example, even then there were a couple bugs in the transform, but at least the process worked. It's surprising to me how many junior devs cannot do a simple ETL in a scripting language.
1 comments

Last year, I took a contract leading a dev team at a top company here in Austin. The team, largely educated in India, said doing substantial ETL-like cleanup of the pipe-separated 15GB input file would take then about 3 days: 1 day to write the Scala & Java code, and 2 days to process it through the Spark cluster. I shocked them when I called B.S.! I spent the rest of the afternoon showing what you can do with pipelines of grep, cut, paste, and awk. Took 2-3 hours to build the proper pipeline and only 15 minutes to run it on my local laptop hard disk. The sad thing is they were impressed, but still inclined to use the ridiculously complicated cluster pipelines instead, since that was "the way we were taught..."
This sort of stuff isn't taught anywhere though, you have to either be shown it by someone else in a previous job, or have it annoy you enough to figure it out with some serious google fu (and the freedom to follow that path).

It's an unknown unknown, a lot of programmers will simply not know you can use quick and dirty scripts to process data if you're only going to do it once.

I work mainly in .Net and the similar problem I see is devs who don't even know that console apps exist, let alone how to make them, which simplifies prototyping new code immensely.

>This sort of stuff isn't taught anywhere though, you have to either be shown it by someone else in a previous job, or have it annoy you enough to figure it out with some serious google fu (and the freedom to follow that path).

Sure it is taught. It's taught by some people ( including me - plug here for my Linux and Python courses, with testimonials: https://jugad2.blogspot.com/p/training.html ) in their Unix / Linux courses, as examples of how to really put the classic (and oft-quoted) Unix philosopy + command-line tools + shell scripting to good use, synergistically ("write small tools to do one job each, well, and connect them by pipelines and I/O redirection, etc. etc.").

It's not even rocket science; bread-and-butter EDP/IT folks (programmers and even operators) (not just clued-in software engineers in product companies) have been using such scripts for decades, routinely, without thinking they are doing anything great or out of the ordinary. (And similarly for other OSs, I'm sure, such as Rexx on some platforms. Not sure what was used on Windows before PowerShell, maybe Perl and/or one of the Unix toolkit clones like MKS Toolkit or Cygwin or UWin - apart from clunky batch file language and DOS/Windows command-line commands).

It's just that it is not so well known nowadays among the (often mainly JS-using) generation, who even write CLI apps in Node.

>It's an unknown unknown, a lot of programmers will simply not know you can use quick and dirty scripts to process data if you're only going to do it once.

It's their loss (and that of the industry), comes from not trying to learn about prior art.

There's even a name for it: NIH syndrome [1], and heck, even that is not new :) Dates from early IBM days or earlier ...

https://en.wikipedia.org/wiki/Not_invented_here

Because for some, tools like awk, sed and perhaps grep are unknown skills.

The focus is on pumping out Java or C++ folks who never were given assignments using said tools and therefore never knew or learned them. Python is the new Perl, not sure what the new awk is.

>Python is the new Perl, not sure what the new awk is.

Red, said sed.

Unfortunately, the reason they probably chose their own solution is that it was probably the easiest way to get it into production in a repeatable way in their mind with the least risk they would mess it up.