Hacker News new | ask | show | jobs
by mac-chaffee 2186 days ago
Maybe I'm abusing lru_cache but another use for it is debouncing.

We had a chatbot that polls a server and sends notifications, but due to clock skew it would sometimes send two notifications. So I just added the lru_cache decorator to the send(username, message) function to prevent that.

6 comments

I noticed that anytime I thought I was clever by using memoization on a function with side-effects, that it in fact could become very complicated very quickly. It's a bit similar in your situation: What if someone actually wants to send the same message twice? Depending on your codebase it could be hard to debug the issue which will arise.

One thing where I do think it is kind of safe is for loading a file into a class instance. The memoization will avoid defining (1) a method to check to see if the file is loaded, (2) a method to load the file and (3) a class variable which points to the loaded file.

That sounds very interesting. Can you detail the problem a bit (I'm not sure I understand how clock skew affects python code) and how lru_cache decorator fixed it?
It's not clock skew of the CPU/Python in any sense. My theory: Their script is run every 10seconds without taking into account the execution time of previous runs, and they noticed duplicates as the same message would be picked up on multiple executions of their polling script because of probably network delays on the REST API they're querying. So they used the lru_cache decorator to filter out duplicates according to the message content and the user that made it. Really bad design if you ask me, as it's essentially "throwing spaghetti" onto the wall instead of figuring out what's really going wrong. As other people have mentioned, they should instead be using a plain ID to filter out duplicates because it is the most unique identifier.
Essentially the OP is using the decorator to prevent the function call from ever being run more than one time. It’s at most once.
More specifically, it prevents a call being repeated _in quick succession_, if there are enough calls in between repetitions it'll fall out of the cache and be reprocessed.
Maybe clock skew is the wrong word. Basically we'd run the script every 10 seconds, which would use the Jira API to fetch any comments made in the last 10 seconds. So sometimes Jira would return the same comment twice, causing the script to send out two notifications, about 10 seconds apart.
That sounds to me like a Push setup would be preferable here. There are a number of Jira add-ons that can run code in response to system events like a new comment. That way you should be able to avoid the concurrency problems completely.
That comment entry that you get back from JIRA should have a unique "ID" attached to it that you can use to do a duplicate check on. According to you, your function signature is send(username, message), in which case this solution will fail if the same user makes the same comment on two different issues.

Have a look at their REST API docs for retrieving comments on an Issue:

https://docs.atlassian.com/software/jira/docs/api/REST/8.10....

You'll see that they respond with an ID for each comment. That ID is unique and probably the PK of the comment on the JIRA application's DB. That is the the key that you need to use to do a duplicate check. Not username and message content.

I checked the code (probably should have done that in the first place) and it turns out we include a Jira link in the message, which contains the comment ID.
This is neat atho I would agree it isn’t the right tool. The reason is that the time until a second message with the same content can be sent is indeterminate, based on how many intervening messages are sent. If your app does want to send the same notice twice, with an hour in between, and there are no intervening messages, it would fail, so the behavior is unpredictable.

A timed algorithm seems better suited. That said, it probably doesn’t matter much.

Are you not using a message id of some sort?
Do you mean in the send() function? No sorry I'm talking about a chatbot that forwards Jira comments into a chatroom, so usually the combination of the username and the message is unique enough on its own to produce a good hash for lru_cache without needing a message_id if that's what you're referring to.
Well, that's pretty cool.