| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by drabinowitz 2415 days ago
	You can include a locked_at field and have your update query be for not_started rows and started rows where locked_at is older than the job timeout

1 comments

shrimpx 2415 days ago

A global job timeout might be unreasonable with high variance in workload. Eg some jobs taking 0.5 seconds and others 30 seconds. You might set a global timeout of say 60s but it sucks to wait 59.5s to reap that short job whose worker crashed. A better system is to make workers update a timestamp on an interval and you reap any jobs that haven't been updated in N seconds.

link

pas 2415 days ago

It's a trade off between updates per sec and latency.

Maybe simply using a timeout per job type is a better way. (That of course trades off simplicity.)

link

shrimpx 2414 days ago

I agree. Frequency of updates also becomes more of an issue as you add workers. Say you have 1000 workers each updating every 2 seconds. That's ~500 timestamp update statements per second which is not trivial in terms of added load on the DB.

link