|
|
|
|
|
by arafa
3381 days ago
|
|
"Going from imperative programming to functional programming has been a powerful paradigm shift for us to think about financial processing and accounting. We can now think of this system as a straightforward actor/handler system rather than getting mired in complicated SQL-join logic." Though whether SQL is a functional language (or a programming language at all, if you're talking ANSI SQL) is a subtle question, I would at the very least not describe it as a traditional imperative programming language. I think this is an important distinction for the article because, contrary to what this article suggests, I've found SQL to be quite helpful for understanding functional and declarative programming concepts. That said, it might be a lot easier to express the types of tasks in the article as straight-forward functions rather than getting wrapped up in all this set-based talk in SQL. |
|
When you program directly against Spark, you're effectively building SQL plans explicitly. It's both more indirect - instead of writing a program that does stuff, you write a program that creates a data flow graph that does stuff; and you have more responsibility for performance, for good and bad.
I think to get good performance, you simply can't think on a per-item basis. You need to orient your thinking towards what can be efficiently performed at the bulk level. Whether it's column scanning in HDFS, or index scanning in a RDBMS, you need to be aware of the engineering properties of the operators you're applying. Doing lots of things per-item is a recipe for blowing your budgets, whether it's cache, memory, I/O, whatever. You want to iteratively do a little work to lots of items, and then join, rather than lots of work to each item one at a time.