Hacker News new | ask | show | jobs
by GeoAtreides 2185 days ago
That sounds very interesting. Can you detail the problem a bit (I'm not sure I understand how clock skew affects python code) and how lru_cache decorator fixed it?
3 comments

It's not clock skew of the CPU/Python in any sense. My theory: Their script is run every 10seconds without taking into account the execution time of previous runs, and they noticed duplicates as the same message would be picked up on multiple executions of their polling script because of probably network delays on the REST API they're querying. So they used the lru_cache decorator to filter out duplicates according to the message content and the user that made it. Really bad design if you ask me, as it's essentially "throwing spaghetti" onto the wall instead of figuring out what's really going wrong. As other people have mentioned, they should instead be using a plain ID to filter out duplicates because it is the most unique identifier.
Essentially the OP is using the decorator to prevent the function call from ever being run more than one time. It’s at most once.
More specifically, it prevents a call being repeated _in quick succession_, if there are enough calls in between repetitions it'll fall out of the cache and be reprocessed.
Maybe clock skew is the wrong word. Basically we'd run the script every 10 seconds, which would use the Jira API to fetch any comments made in the last 10 seconds. So sometimes Jira would return the same comment twice, causing the script to send out two notifications, about 10 seconds apart.
That sounds to me like a Push setup would be preferable here. There are a number of Jira add-ons that can run code in response to system events like a new comment. That way you should be able to avoid the concurrency problems completely.