Here is an implementation of a memoize decorator in Python that will support all inputs [1]. You'd have to modify to not use all the Pylons framework stuff.
One way to optimize performance is to change program text from one thing into something else. In this case, "function calls" refers to the text before the transformation is applied.
It seems reasonable to me to describe an optimization by identifying where it can be applied. If someone wrote a blog post titled "optimizing multiplication by 2" and went on to describe changing "x * 2" into "x + x", there would be nothing unusual about that. It would no longer be multiplication, but we wouldn't necessarily expect it to be.
I agree it can be interpreted another way in this case, though. So while I maintain it is not wrong, I admit it is ambiguous. Someone could have used the same title had they written about a one-line optimization to the Python interpreter.
It only speeds up calls to pure (side-effect free) functions, that have been executed before with these exact parameters and where the result is still cached. All other calls get slower.
Also, there's an argument to be made that "function call" is the overhead involved in moving control from the call site to the callee (which can be significant in interpreted languages such as Python). That's at least how I interpreted it at first.
It speeds up recursive functions that haven’t been called before with the same arguments. In this case, it caches all the intermediate results that would otherwise have to be recalculated.
No, you have to be more specific than "recursive functions", many recursive functions will see no such benefit. A simple example is factorial function: adding @lru_cache to that will only slow it down.
This technique ("memoization") is a technique from dynamic programming, and it only works on very particular types of functions. The two criteria are:
* Optimal substructure: the function has to be a recursive function that can be solved by solving "smaller" versions of the same function (this is roughly what people mean when they say a function can be solved "recursively")
* Overlapping subproblems: the smaller sub-problems have to overlap, so that memoizing them actually introduces a benefit. For instance, in order to calculate F(10) (F(n) being the Fibonacci sequence), you have to calculate F(9) and F(8), and in order to calculate F(9), you have to calculate F(8) and F(7). That means that F(10)'s subproblems overlap with F(9)'s (both require F(8)).
For the factorial function, the first is true, the second is not. There are no overlapping subproblems for calculating the factorial, so memoizing the function in order to speed it up is pointless.
This is a very particular kind of problem. It's a type of problem you learn about in computer science class, have to face in terrible developer job interviews, and find all over the place on Project Euler. It's is very rare that you actually encounter it in the wild. Giving the advice "@lru_cache speeds up functions!" is misguided: it only speeds up a very particular kind of function. It slows down basically all other pure functions. Also: if you've identified a problem of this kind, memoization is not the fastest way to solve it: turning the recursion upside-down and iterating usually is.
The real use-case for @lru_cache, IMHO, is to use it as an actual cache: like, caching database entries or other data you'd need to do I/O to fetch. That is a much more reasonable use case.
I agree, I think of function call as the time it takes to copy the stack frame, setup variables, and move the program counter. I'm not sure that adding a decorator slows this down, but I suspect it does. A better title would just be 'add memoization in one line in python' or something.
Just a side note: with Fibonacci + caching it solidly become Dynamic programming problem so Time complexity reduces from quadratic to o(n), IIRC .
There is a whole class of problems where
recursion + memoization(caching) = Top Down Dynamic programming ,
The other way to Increase performance and actually reduce call stack in these class of problems including Fibonacci would be Bottom Up Dynamic Programming
The follow-up posts go into even more detail on individual problems that can be solved using either form of dynamic programming, including some real-world problems like content-aware image resizing.
Thank you for the link, I enjoyed the post a lot. The drawings you've done add a lot to the explanation. Do you have any other deep dives you've done in the same style that you'd recommend?
Yes, I have mentioned DP in my article as just reference. The Purpose of article was to show that how easy it is to implement caching in python using lru_cache decorator which is in std lib. So, We don't have to install any 3rd party lib.
Edit: I don't actually have much of a problem with the article headline as it is now. It depends a lot on your target audience! For Hacker News, yeah, we know what memoization is because we learned it in CS 102, right after we learned about recursion.
But for a lot of people who would get the most value from the article, the word "memoization" isn't going to mean much, and wouldn't read the article.
Maybe something like: "Memoization in Python, or how I learned to speed up function calls in just one line"
I had worked on a open source project called 'safecache' on the similar note. As others has already commented, @lru_cache does not play well with mutable data structures. So my implementation handles for both immutable and mutable data structures as well as multi-threaded operations.
Linear means that f(n) is a linear function of f(n-1)...f(n-k). This means that you can calculate f(n)...f(n-k+1) in terms of a matrix multiply on f(n-1)...f(n-k) (most of the matrix is empty, with ones just off the diagonal). Taking powers of the matrix lets you take more steps. Diagonalizing the matrix lets you take powers in constant time. (You may have to worry about precision. Often rounding to nearest integer after that fixes things though).
If you are worried about that, or just don't have floating point, the "fastexp" algorithm turns it into logarithmic time.
If A and B are a solution, then A+B is also a solution. For a constant r, if C is a solution, r*C is also solution.
That's it. The nice closed form of exponentials as solutions is just for linear difference equations with constant coefficients. In that case, you can always find a solution by trying λ^n and seeing what that forces λ to be -- it'll be a root of some polynomial.
It does not! It fails very early (I tested it once in Python, and it failed on something like F(70) or something) and should not ever be used for practical calculation of Fibonacci numbers.
For small values, the iterative version works just fine. For larger values, there are other methods: one obvious one is using the matrix form of the Fibonacci numbers, which just requires you to take an exponent of a 2x2 matrix (which you can do quickly because of exponentiation by squaring).
The Fibonacci series grows so fast that for "large" values of N you'll need a arbitrary size integer implementation anyway, so at that point you might as well go for a (non-IEEE) arbitrary size float type and get all the significant digits you need.
This memoizes closures of functions, and lets them be executed on other processors? Does it support dependency tracking between memoized closures (incremental recomputation), or do I have to roll that, myself?
Unfortunately you need to roll that yourself in joblib. But it is automatic in bionic (https://github.com/square/bionic), assuming your workflow can be structured in the way bionic likes.
After, reading your comment. I cross checked result and found out that 3 is most optimal size. I even ran fib(40) on it. with size 2, There are many misses but after 3 and on wards misses are constant(if fib(40), then misses are only 40 which emulates DP approach of O(N)).
Why 3 is optimal, because of how recursion and LRU work. I wish i can explain it using animation.
It makes sense. f(n) depends on f(n-1) and f(n-2). So if the cache is able to produce these 2 values, you basically get the linear algorithm from the article. I assume the running f(n) also takes up a cache slot, hence 3 instead of 2.
If this theory is correct, every recursive function f(n) requiring access to f(n-x) should have x+1 as maximum usefull cache size.
I think it's likely that the author's analysis of why 5 is the optimal cache size heavily depends on the fact that they only tried fib(10). For example, it's clear that cache 20 and cache 30 are identical if your universe of possible inputs is only [0 .. 10], haha.
Because of the way the recursion goes down the tree depth-first, I think 5 basically cuts down the computation to fib(10-5) = fib(5) which is pretty manageable, so the author couldn't really see any further measurable performance gains by increasing the cache further. I think for fib(35) it'd be clear that cache size 35 would help compared to cache size 5. (I picked 35 instead of, say, 300, because I think 300 would just not finish with cache size 5, it'd take forever haha.)
Maybe I'm abusing lru_cache but another use for it is debouncing.
We had a chatbot that polls a server and sends notifications, but due to clock skew it would sometimes send two notifications. So I just added the lru_cache decorator to the send(username, message) function to prevent that.
I noticed that anytime I thought I was clever by using memoization on a function with side-effects, that it in fact could become very complicated very quickly. It's a bit similar in your situation: What if someone actually wants to send the same message twice? Depending on your codebase it could be hard to debug the issue which will arise.
One thing where I do think it is kind of safe is for loading a file into a class instance. The memoization will avoid defining (1) a method to check to see if the file is loaded, (2) a method to load the file and (3) a class variable which points to the loaded file.
That sounds very interesting. Can you detail the problem a bit (I'm not sure I understand how clock skew affects python code) and how lru_cache decorator fixed it?
It's not clock skew of the CPU/Python in any sense. My theory: Their script is run every 10seconds without taking into account the execution time of previous runs, and they noticed duplicates as the same message would be picked up on multiple executions of their polling script because of probably network delays on the REST API they're querying. So they used the lru_cache decorator to filter out duplicates according to the message content and the user that made it. Really bad design if you ask me, as it's essentially "throwing spaghetti" onto the wall instead of figuring out what's really going wrong. As other people have mentioned, they should instead be using a plain ID to filter out duplicates because it is the most unique identifier.
More specifically, it prevents a call being repeated _in quick succession_, if there are enough calls in between repetitions it'll fall out of the cache and be reprocessed.
Maybe clock skew is the wrong word. Basically we'd run the script every 10 seconds, which would use the Jira API to fetch any comments made in the last 10 seconds. So sometimes Jira would return the same comment twice, causing the script to send out two notifications, about 10 seconds apart.
That sounds to me like a Push setup would be preferable here. There are a number of Jira add-ons that can run code in response to system events like a new comment. That way you should be able to avoid the concurrency problems completely.
That comment entry that you get back from JIRA should have a unique "ID" attached to it that you can use to do a duplicate check on. According to you, your function signature is send(username, message), in which case this solution will fail if the same user makes the same comment on two different issues.
Have a look at their REST API docs for retrieving comments on an Issue:
You'll see that they respond with an ID for each comment. That ID is unique and probably the PK of the comment on the JIRA application's DB. That is the the key that you need to use to do a duplicate check. Not username and message content.
I checked the code (probably should have done that in the first place) and it turns out we include a Jira link in the message, which contains the comment ID.
This is neat atho I would agree it isn’t the right tool. The reason is that the time until a second message with the same content can be sent is indeterminate, based on how many intervening messages are sent. If your app does want to send the same notice twice, with an hour in between, and there are no intervening messages, it would fail, so the behavior is unpredictable.
A timed algorithm seems better suited. That said, it probably doesn’t matter much.
Do you mean in the send() function? No sorry I'm talking about a chatbot that forwards Jira comments into a chatroom, so usually the combination of the username and the message is unique enough on its own to produce a good hash for lru_cache without needing a message_id if that's what you're referring to.
Functools’ lru_cache also has good methods for getting more info about the cache’s utilisation (.cache_info() I think), which is quite helpful when found in logs.
lru_cache doesn't work with lists or dicts, or indeed any non-hashable data, so it's not quite a transparent change. About half the time I use a cache I end up implementing my own.
This is neat and I learned something new. TLDR: use the functools.lru_cache decorator to add caching to slow functions.
I must admit I was hoping for a general approach to speeding up all function calls in python. Functions are the primary mechanism for abstraction and yet calls are relatively heavy in their own right. It would be neat if python had a way to do automatic inlining or some such optimization so that I could have my abstractions but avoid the performance hit of a function call (even at the expense of more byte code).
A benefit of Python is that it removes consideration of low level type details. You can just code your problem and think at a higher level, in terms of larger custom classes etc. This is great for developer speed. But to improve performance of code, you need to micro manage your types, you need to understand details about the compiler and the processor, you need to know what the machine is doing to the data to service your particular process, and streamline it accordingly.
Is there a magic function? If one crops up, it would get baked into Python, development of which is extremely active. So magic functions trend obsolete. The main thing developers can consistently do to improve performance, is micro manage types and algorithms to improve the performance of bottlenecks. This is done with a tool like Cython, but you have to learn the discipline of coding for performance, you have to learn the details of what the machine is doing with the data in your particular piece of code. When do you want to replace linked lists with arrays? When are you doing unnecessary type conversion? How overweight are your high frequency objects? How is memory being used? Are there more efficient ways on my microarchitecture to get this calculation done... Performance coding is a discipline like application coding.
Is this a problem with Python? I don't think so, because performance comes second in the developer process. Write the code first then optimize the real bottlenecks.
The last bit of deterministic functions is a main selling point of functional programming (and get help from compilers) or any design that promotes creating pure functions.
Okay, at least got the decency of providing a TL;DR. But if your summary is literally three words, why not put it in the title: “speed up Python function calls with functools.lru_cache”?
Thanks for similar articles. I am aware of similar articles and that's why I have kept it short and simple. I will add these as references, later.
I have enjoyed reading Python Tricks[0] book by Dan bader (author of https://dbader.org/blog/python-memoization ). It's awesome. Might be outdated but I still recommend it to any body who has just started learning python.
Here is an implementation of a memoize decorator in Python that will support all inputs [1]. You'd have to modify to not use all the Pylons framework stuff.
[0] https://en.wikipedia.org/wiki/Memoization
[1] https://github.com/reddit-archive/reddit/blob/master/r2/r2/l...