| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mac-chaffee 2186 days ago
	Maybe I'm abusing lru_cache but another use for it is debouncing. We had a chatbot that polls a server and sends notifications, but due to clock skew it would sometimes send two notifications. So I just added the lru_cache decorator to the send(username, message) function to prevent that.

6 comments

huijzer 2186 days ago

I noticed that anytime I thought I was clever by using memoization on a function with side-effects, that it in fact could become very complicated very quickly. It's a bit similar in your situation: What if someone actually wants to send the same message twice? Depending on your codebase it could be hard to debug the issue which will arise.

One thing where I do think it is kind of safe is for loading a file into a class instance. The memoization will avoid defining (1) a method to check to see if the file is loaded, (2) a method to load the file and (3) a class variable which points to the loaded file.

link

GeoAtreides 2186 days ago

That sounds very interesting. Can you detail the problem a bit (I'm not sure I understand how clock skew affects python code) and how lru_cache decorator fixed it?

link

zo1 2186 days ago

It's not clock skew of the CPU/Python in any sense. My theory: Their script is run every 10seconds without taking into account the execution time of previous runs, and they noticed duplicates as the same message would be picked up on multiple executions of their polling script because of probably network delays on the REST API they're querying. So they used the lru_cache decorator to filter out duplicates according to the message content and the user that made it. Really bad design if you ask me, as it's essentially "throwing spaghetti" onto the wall instead of figuring out what's really going wrong. As other people have mentioned, they should instead be using a plain ID to filter out duplicates because it is the most unique identifier.

link

whalesalad 2186 days ago

Essentially the OP is using the decorator to prevent the function call from ever being run more than one time. It’s at most once.

link

faceplanted 2186 days ago

More specifically, it prevents a call being repeated _in quick succession_, if there are enough calls in between repetitions it'll fall out of the cache and be reprocessed.

link

mac-chaffee 2186 days ago

Maybe clock skew is the wrong word. Basically we'd run the script every 10 seconds, which would use the Jira API to fetch any comments made in the last 10 seconds. So sometimes Jira would return the same comment twice, causing the script to send out two notifications, about 10 seconds apart.

link

schuppentier 2186 days ago

That sounds to me like a Push setup would be preferable here. There are a number of Jira add-ons that can run code in response to system events like a new comment. That way you should be able to avoid the concurrency problems completely.

link

zo1 2186 days ago

That comment entry that you get back from JIRA should have a unique "ID" attached to it that you can use to do a duplicate check on. According to you, your function signature is send(username, message), in which case this solution will fail if the same user makes the same comment on two different issues.

Have a look at their REST API docs for retrieving comments on an Issue:

https://docs.atlassian.com/software/jira/docs/api/REST/8.10....

You'll see that they respond with an ID for each comment. That ID is unique and probably the PK of the comment on the JIRA application's DB. That is the the key that you need to use to do a duplicate check. Not username and message content.

link

mac-chaffee 2186 days ago

I checked the code (probably should have done that in the first place) and it turns out we include a Jira link in the message, which contains the comment ID.

link

crystaln 2186 days ago

This is neat atho I would agree it isn’t the right tool. The reason is that the time until a second message with the same content can be sent is indeterminate, based on how many intervening messages are sent. If your app does want to send the same notice twice, with an hour in between, and there are no intervening messages, it would fail, so the behavior is unpredictable.

A timed algorithm seems better suited. That said, it probably doesn’t matter much.

link

andreareina 2186 days ago

Are you not using a message id of some sort?

link

mac-chaffee 2186 days ago

Do you mean in the send() function? No sorry I'm talking about a chatbot that forwards Jira comments into a chatroom, so usually the combination of the username and the message is unique enough on its own to produce a good hash for lru_cache without needing a message_id if that's what you're referring to.

link

raymondh 2186 days ago

Well, that's pretty cool.

link