Hacker News new | ask | show | jobs
by kenoph 1571 days ago
You can do a lot with just bash + pipes + unix tools. It can get messy as your pipeline grows though, and there are a lot of edge cases.

Relevant: "bashML: Why Spark when you can Bash?" (https://rev.ng/blog/bashml/post.html), aka how to deduplicate git repositories using `comm` + `awk`.

1 comments

Bash is a godsend for quick debugging and I can see the temptation to start writing production code using bash. It basically boils down to a few things IMO: - large bash scripts are hard to read/maintain - complex modelling chains need intermediary points in the processing On the latter point I can't count the amount of times where being able to query an athena database has saved a lot of headaches. The overhead from parquet and AWS bills pales in comparison. I'm sure almost everyone already agrees with me here but it's a classic case of the whole being more than the sum of its parts.
And bash is really painful once you're trying to do clever things with structured data.

Excellent glue, but there's also a skill in knowing when you should port your increasingly complicated shell script to another language.