Speeding up function calls with lru_cache in Python

Denvercoder9 2185 days ago

Caching the result is not speeding up function calls.

adrianmonk 2185 days ago

One way to optimize performance is to change program text from one thing into something else. In this case, "function calls" refers to the text before the transformation is applied.

It seems reasonable to me to describe an optimization by identifying where it can be applied. If someone wrote a blog post titled "optimizing multiplication by 2" and went on to describe changing "x * 2" into "x + x", there would be nothing unusual about that. It would no longer be multiplication, but we wouldn't necessarily expect it to be.

I agree it can be interpreted another way in this case, though. So while I maintain it is not wrong, I admit it is ambiguous. Someone could have used the same title had they written about a one-line optimization to the Python interpreter.

jhrmnn 2185 days ago

It is, though. It doesn’t speed up function execution, but it literally speeds up the function call.

Denvercoder9 2185 days ago

It only speeds up calls to pure (side-effect free) functions, that have been executed before with these exact parameters and where the result is still cached. All other calls get slower.

Also, there's an argument to be made that "function call" is the overhead involved in moving control from the call site to the callee (which can be significant in interpreted languages such as Python). That's at least how I interpreted it at first.

empath75 2185 days ago

It speeds up recursive functions that haven’t been called before with the same arguments. In this case, it caches all the intermediate results that would otherwise have to be recalculated.

OskarS 2185 days ago

No, you have to be more specific than "recursive functions", many recursive functions will see no such benefit. A simple example is factorial function: adding @lru_cache to that will only slow it down.

This technique ("memoization") is a technique from dynamic programming, and it only works on very particular types of functions. The two criteria are:

* Optimal substructure: the function has to be a recursive function that can be solved by solving "smaller" versions of the same function (this is roughly what people mean when they say a function can be solved "recursively")

* Overlapping subproblems: the smaller sub-problems have to overlap, so that memoizing them actually introduces a benefit. For instance, in order to calculate F(10) (F(n) being the Fibonacci sequence), you have to calculate F(9) and F(8), and in order to calculate F(9), you have to calculate F(8) and F(7). That means that F(10)'s subproblems overlap with F(9)'s (both require F(8)).

For the factorial function, the first is true, the second is not. There are no overlapping subproblems for calculating the factorial, so memoizing the function in order to speed it up is pointless.

This is a very particular kind of problem. It's a type of problem you learn about in computer science class, have to face in terrible developer job interviews, and find all over the place on Project Euler. It's is very rare that you actually encounter it in the wild. Giving the advice "@lru_cache speeds up functions!" is misguided: it only speeds up a very particular kind of function. It slows down basically all other pure functions. Also: if you've identified a problem of this kind, memoization is not the fastest way to solve it: turning the recursion upside-down and iterating usually is.

The real use-case for @lru_cache, IMHO, is to use it as an actual cache: like, caching database entries or other data you'd need to do I/O to fetch. That is a much more reasonable use case.

gtm1260 2185 days ago

I agree, I think of function call as the time it takes to copy the stack frame, setup variables, and move the program counter. I'm not sure that adding a decorator slows this down, but I suspect it does. A better title would just be 'add memoization in one line in python' or something.

kevin_thibedeau 2185 days ago

It speeds up algorithms that are amenable to memoization.

Author here. Sorry, If it feels misleading. I have no intention to mislead anybody. If you can suggest alternative title, I am happy to change title.

shankysingh 2185 days ago

Thanks for article

Just a side note: with Fibonacci + caching it solidly become Dynamic programming problem so Time complexity reduces from quadratic to o(n), IIRC .

There is a whole class of problems where recursion + memoization(caching) = Top Down Dynamic programming , The other way to Increase performance and actually reduce call stack in these class of problems including Fibonacci would be Bottom Up Dynamic Programming

Some gists I found on it https://gist.github.com/trtg/4662449

akdas 2185 days ago

Plugging my own writing, but I did a deep dive on top-down vs. bottom-up dynamic programming on my blog: https://avikdas.com/2019/04/15/a-graphical-introduction-to-d...

The follow-up posts go into even more detail on individual problems that can be solved using either form of dynamic programming, including some real-world problems like content-aware image resizing.

eindiran 2185 days ago

Thank you for the link, I enjoyed the post a lot. The drawings you've done add a lot to the explanation. Do you have any other deep dives you've done in the same style that you'd recommend?

akdas 2185 days ago

Thanks! I'm glad the drawings were helpful.

Most of my writing can be found at my blog (https://avikdas.com/), but here are some from the same dynamic programming series:

- Deep-dive into the Chain Matrix Multiplication Problem - https://avikdas.com/2019/04/25/dynamic-programming-deep-dive... - Real-world applications of DP: https://avikdas.com/2019/05/14/real-world-dynamic-programmin... and https://avikdas.com/2019/07/29/improved-seam-carving-with-fo... - Another real-world application, this time in machine learning - https://avikdas.com/2019/06/24/dynamic-programming-for-machi...

If you look on my blog, you'll also see my recent series is on scalability, things like read-after-write consistency and queues for reliability.

Yes, I have mentioned DP in my article as just reference. The Purpose of article was to show that how easy it is to implement caching in python using lru_cache decorator which is in std lib. So, We don't have to install any 3rd party lib.

shankysingh 2185 days ago

oh sorry , somehow I missed it :)

anandoza 2185 days ago

the simple recursive solution is actually O(fib(n)), which is exponential

Denvercoder9 2185 days ago

I'd suggest something along the lines of "Speeding up functions [in Python] by caching results" :)

fwip 2185 days ago

"python memoization in one line"

Edit: I don't actually have much of a problem with the article headline as it is now. It depends a lot on your target audience! For Hacker News, yeah, we know what memoization is because we learned it in CS 102, right after we learned about recursion.

But for a lot of people who would get the most value from the article, the word "memoization" isn't going to mean much, and wouldn't read the article.

Maybe something like: "Memoization in Python, or how I learned to speed up function calls in just one line"

https://github.com/Verizon/safecache

initbar 2185 days ago

I had worked on a open source project called 'safecache' on the similar note. As others has already commented, @lru_cache does not play well with mutable data structures. So my implementation handles for both immutable and mutable data structures as well as multi-threaded operations.

vorticalbox 2185 days ago

Thanks for this.

gammarator 2185 days ago

(Fibonacci numbers have a closed form analytic solution that can be computed in constant time: https://www.evanmiller.org/mathematical-hacker.html)

magnio 2185 days ago

Yep, this is called Binet formula, and actually all linear recurrence sequences have one.

throwaway_pdp09 2185 days ago

What defines 'linear' here?

nimish 2185 days ago

The solutions form a vector space

throwaway_pdp09 2185 days ago

I regret that does not elucidate. Does linear mean additions only? Or non-exponentiation, or something else?

wnoise 2185 days ago

Linear means that f(n) is a linear function of f(n-1)...f(n-k). This means that you can calculate f(n)...f(n-k+1) in terms of a matrix multiply on f(n-1)...f(n-k) (most of the matrix is empty, with ones just off the diagonal). Taking powers of the matrix lets you take more steps. Diagonalizing the matrix lets you take powers in constant time. (You may have to worry about precision. Often rounding to nearest integer after that fixes things though). If you are worried about that, or just don't have floating point, the "fastexp" algorithm turns it into logarithmic time.

nimish 2185 days ago

If A and B are a solution, then A+B is also a solution. For a constant r, if C is a solution, r*C is also solution.

That's it. The nice closed form of exponentials as solutions is just for linear difference equations with constant coefficients. In that case, you can always find a solution by trying λ^n and seeing what that forces λ to be -- it'll be a root of some polynomial.

nimithryn 2185 days ago

Does this work in practice for large values of n? You will be limited by numerical error in phi.

OskarS 2185 days ago

It does not! It fails very early (I tested it once in Python, and it failed on something like F(70) or something) and should not ever be used for practical calculation of Fibonacci numbers.

For small values, the iterative version works just fine. For larger values, there are other methods: one obvious one is using the matrix form of the Fibonacci numbers, which just requires you to take an exponent of a 2x2 matrix (which you can do quickly because of exponentiation by squaring).

WJW 2185 days ago

The Fibonacci series grows so fast that for "large" values of N you'll need a arbitrary size integer implementation anyway, so at that point you might as well go for a (non-IEEE) arbitrary size float type and get all the significant digits you need.

nimithryn 2185 days ago

You’ll still need to compute the digits of Phi though, and then exponentiate that. My intuition is that using ints is still faster.

jackblemming 2185 days ago

Calculating powers isn't constant, as far as I know.

bravura 2185 days ago

I recently discovered that joblib can do something similar, both on disk and in memory: https://joblib.readthedocs.io/en/latest/memory.html

thechao 2185 days ago

This memoizes closures of functions, and lets them be executed on other processors? Does it support dependency tracking between memoized closures (incremental recomputation), or do I have to roll that, myself?

trombonechamp 2185 days ago

Unfortunately you need to roll that yourself in joblib. But it is automatic in bionic (https://github.com/square/bionic), assuming your workflow can be structured in the way bionic likes.

thechao 2185 days ago

Bionic is what I'm looking for (right on their use-case page): a Make replacement that can work with intermediate computations, and not just files.

anandoza 2185 days ago

> As, we can see the optimal cache size of fib function is 5. Increasing cache size will not result in much gain in terms of speedup.

Try it with fib(35), curious what you find.

After, reading your comment. I cross checked result and found out that 3 is most optimal size. I even ran fib(40) on it. with size 2, There are many misses but after 3 and on wards misses are constant(if fib(40), then misses are only 40 which emulates DP approach of O(N)).

Why 3 is optimal, because of how recursion and LRU work. I wish i can explain it using animation.

You can play with it. https://repl.it/repls/NocturnalIroncladBytecode#main.py

hyperman1 2185 days ago

It makes sense. f(n) depends on f(n-1) and f(n-2). So if the cache is able to produce these 2 values, you basically get the linear algorithm from the article. I assume the running f(n) also takes up a cache slot, hence 3 instead of 2.

If this theory is correct, every recursive function f(n) requiring access to f(n-x) should have x+1 as maximum usefull cache size.

ptr 2185 days ago

Why would fib(35) be more optimal with a differently sized cache? And how come _5_ is optimal? Wouldn’t 2 be all you need? What am I missing?

anandoza 2185 days ago

I think it's likely that the author's analysis of why 5 is the optimal cache size heavily depends on the fact that they only tried fib(10). For example, it's clear that cache 20 and cache 30 are identical if your universe of possible inputs is only [0 .. 10], haha.

Because of the way the recursion goes down the tree depth-first, I think 5 basically cuts down the computation to fib(10-5) = fib(5) which is pretty manageable, so the author couldn't really see any further measurable performance gains by increasing the cache further. I think for fib(35) it'd be clear that cache size 35 would help compared to cache size 5. (I picked 35 instead of, say, 300, because I think 300 would just not finish with cache size 5, it'd take forever haha.)

Maybe I'm abusing lru_cache but another use for it is debouncing.

We had a chatbot that polls a server and sends notifications, but due to clock skew it would sometimes send two notifications. So I just added the lru_cache decorator to the send(username, message) function to prevent that.

huijzer 2185 days ago

I noticed that anytime I thought I was clever by using memoization on a function with side-effects, that it in fact could become very complicated very quickly. It's a bit similar in your situation: What if someone actually wants to send the same message twice? Depending on your codebase it could be hard to debug the issue which will arise.

One thing where I do think it is kind of safe is for loading a file into a class instance. The memoization will avoid defining (1) a method to check to see if the file is loaded, (2) a method to load the file and (3) a class variable which points to the loaded file.

GeoAtreides 2185 days ago

That sounds very interesting. Can you detail the problem a bit (I'm not sure I understand how clock skew affects python code) and how lru_cache decorator fixed it?

zo1 2185 days ago

It's not clock skew of the CPU/Python in any sense. My theory: Their script is run every 10seconds without taking into account the execution time of previous runs, and they noticed duplicates as the same message would be picked up on multiple executions of their polling script because of probably network delays on the REST API they're querying. So they used the lru_cache decorator to filter out duplicates according to the message content and the user that made it. Really bad design if you ask me, as it's essentially "throwing spaghetti" onto the wall instead of figuring out what's really going wrong. As other people have mentioned, they should instead be using a plain ID to filter out duplicates because it is the most unique identifier.

whalesalad 2185 days ago

Essentially the OP is using the decorator to prevent the function call from ever being run more than one time. It’s at most once.

faceplanted 2185 days ago

More specifically, it prevents a call being repeated _in quick succession_, if there are enough calls in between repetitions it'll fall out of the cache and be reprocessed.

Maybe clock skew is the wrong word. Basically we'd run the script every 10 seconds, which would use the Jira API to fetch any comments made in the last 10 seconds. So sometimes Jira would return the same comment twice, causing the script to send out two notifications, about 10 seconds apart.

schuppentier 2185 days ago

That sounds to me like a Push setup would be preferable here. There are a number of Jira add-ons that can run code in response to system events like a new comment. That way you should be able to avoid the concurrency problems completely.

https://docs.atlassian.com/software/jira/docs/api/REST/8.10....

zo1 2185 days ago

That comment entry that you get back from JIRA should have a unique "ID" attached to it that you can use to do a duplicate check on. According to you, your function signature is send(username, message), in which case this solution will fail if the same user makes the same comment on two different issues.

Have a look at their REST API docs for retrieving comments on an Issue:

You'll see that they respond with an ID for each comment. That ID is unique and probably the PK of the comment on the JIRA application's DB. That is the the key that you need to use to do a duplicate check. Not username and message content.

I checked the code (probably should have done that in the first place) and it turns out we include a Jira link in the message, which contains the comment ID.

crystaln 2185 days ago

This is neat atho I would agree it isn’t the right tool. The reason is that the time until a second message with the same content can be sent is indeterminate, based on how many intervening messages are sent. If your app does want to send the same notice twice, with an hour in between, and there are no intervening messages, it would fail, so the behavior is unpredictable.

A timed algorithm seems better suited. That said, it probably doesn’t matter much.

andreareina 2185 days ago

Are you not using a message id of some sort?

Do you mean in the send() function? No sorry I'm talking about a chatbot that forwards Jira comments into a chatroom, so usually the combination of the username and the message is unique enough on its own to produce a good hash for lru_cache without needing a message_id if that's what you're referring to.

raymondh 2185 days ago

Well, that's pretty cool.

drej 2185 days ago

Functools’ lru_cache also has good methods for getting more info about the cache’s utilisation (.cache_info() I think), which is quite helpful when found in logs.

CyberDildonics 2185 days ago

One line summary: Use lru_cache decorator

@functools.lru_cache(maxsize=31)

def fib(n):

andreareina 2185 days ago

lru_cache doesn't work with lists or dicts, or indeed any non-hashable data, so it's not quite a transparent change. About half the time I use a cache I end up implementing my own.

raymondh 2185 days ago

What approach are you using to efficiently store and look up non-hashable function arguments?

andreareina 2185 days ago

Converting to tuple has mostly been enough. I want to say always, but I don't trust my memory that much, haha.

intrepidhero 2185 days ago

This is neat and I learned something new. TLDR: use the functools.lru_cache decorator to add caching to slow functions.

I must admit I was hoping for a general approach to speeding up all function calls in python. Functions are the primary mechanism for abstraction and yet calls are relatively heavy in their own right. It would be neat if python had a way to do automatic inlining or some such optimization so that I could have my abstractions but avoid the performance hit of a function call (even at the expense of more byte code).

riazrizvi 2185 days ago

A benefit of Python is that it removes consideration of low level type details. You can just code your problem and think at a higher level, in terms of larger custom classes etc. This is great for developer speed. But to improve performance of code, you need to micro manage your types, you need to understand details about the compiler and the processor, you need to know what the machine is doing to the data to service your particular process, and streamline it accordingly.

Is there a magic function? If one crops up, it would get baked into Python, development of which is extremely active. So magic functions trend obsolete. The main thing developers can consistently do to improve performance, is micro manage types and algorithms to improve the performance of bottlenecks. This is done with a tool like Cython, but you have to learn the discipline of coding for performance, you have to learn the details of what the machine is doing with the data in your particular piece of code. When do you want to replace linked lists with arrays? When are you doing unnecessary type conversion? How overweight are your high frequency objects? How is memory being used? Are there more efficient ways on my microarchitecture to get this calculation done... Performance coding is a discipline like application coding.

Is this a problem with Python? I don't think so, because performance comes second in the developer process. Write the code first then optimize the real bottlenecks.

korijn 2185 days ago

You might be interested in pypy or nuitka

satyanash 2185 days ago

What's the ruby equivalent of `functools` and specifically `functools.lru_cache`?

Note that `lru_cache` doesn't just do caching, it also provides convenient methods on the original function to get cache stats etc in a pythonic way.

lalos 2185 days ago

The last bit of deterministic functions is a main selling point of functional programming (and get help from compilers) or any design that promotes creating pure functions.

oefrha 2185 days ago

> One line summary: Use lru_cache decorator

Okay, at least got the decency of providing a TL;DR. But if your summary is literally three words, why not put it in the title: “speed up Python function calls with functools.lru_cache”?

God I hate clickbait.

m4r35n357 2185 days ago

Huh? He sped it up much more by rewriting as a loop . . .

aresic 2185 days ago

dvic CV function on problems

helloxxx123 2185 days ago

Cool

fastball 2185 days ago

tl;dr – memoization.

[0] https://www.mediawiki.org/wiki/Naming_things

asicsp 2185 days ago

Instead of the last quote in the article, I prefer this one (got it from [0])

>"There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors." – Martin Fowler

And there's plenty of similar articles, for example [1] [2]

[1] https://dbader.org/blog/python-memoization

[2] https://mike.place/2016/memoization/

ritter2a 2185 days ago

Even in this version, the quote still feels incomplete without someone shouting "Concurrency!" while it is delivered...

SCLeo 2185 days ago

Concurrency.""There are three hard things in computer science: cache invalidation, naming things, off-by-one errors, and

[0] https://realpython.com/products/python-tricks-book/

Thanks for similar articles. I am aware of similar articles and that's why I have kept it short and simple. I will add these as references, later.

I have enjoyed reading Python Tricks[0] book by Dan bader (author of https://dbader.org/blog/python-memoization ). It's awesome. Might be outdated but I still recommend it to any body who has just started learning python.

asicsp 2185 days ago

Hey, sorry if my post came off wrongly. In hindsight, I should've worded better, something like "similar articles for further reading".

fanf2 2185 days ago

That attribution is wrong - the mediawiki page links to Martin Fowler’s collection of quotes, where he says the off-by-one version came from Leon Bambrick https://twitter.com/secretgeek/status/7269997868?s=21

asicsp 2185 days ago

thanks for the correction, I cannot edit the post though :(

perfunctory 2185 days ago

> @functools.lru_cache

Avoid these tricks if you care about thread safety.