|
|
|
|
|
by quantumtremor
3635 days ago
|
|
> aes(x='np.log(B - A)') I never understood the pattern of putting source code in a string. Why not just use np.log(B - A) directly and configure the function to accept columns? With strings you lose highlighting, semantic analysis from editors, as well as the ability to know what computations are happening when and where. There seems to be no point and significant drawback to this, what's the rationale? |
|
Eager evaluation is when the interpreter evaluates the argument np.log(B-A) BEFORE passing it into aes(). aes() can only see the resulting VALUE, not the EXPRESSION itself.
In contrast, lazy evaluation means that aes() gets the raw expression as an argument. It can evaluate it, pass it on to another function, or otherwise manipulate it (e.g. serialize it back to a string).
In the case of ggplot, this is used to actually evaluate the expression at DIFFERENT values, so you can plot it. Suppose you want to plot f(x) = square(x). It doesn't make any sense to write plot(square(x)), because if x = 5.0, you will get plot(25.0), and you can't make the plot. plot(lambda x: square(x)) make more sense, because then the plot() function can evaluate it with 100 different values of x to get 100 pixel values.
And it's also used to print the expression on the axes. That is, you actually want to print the expression "square(x)" on the graph. You don't want print "25.0" on the graph -- that makes no sense.
This is related to the concept of quotations in Lisp. Quotations are UNEVALUATED program fragments. In Lisp it's an AST, but in Python or any other dynamic language, it has to be a string. In C there is no way to do this (short of shelling out to the C compiler at runtime, which some people have actually done ...)
[1] R has crazy caveats, but that's beyond the scope of this post...