| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bckr 1370 days ago

Anyone using this?

It's not spelled out, but it's apparent you just run the python file containing the app definition and leave it running in the background?

Looks very clean and pythonic.

2 comments

Miksus 1370 days ago

Of course I have this running though it's still running older version (been too busy with developing this). It has been running over half a year for my scrapers without a single interruption even though the machine has the worst specs available. I have tested this with Linux/Unix and Windows at least. Of course, I have gotten message from various people saying they are using it. Some have said they migrated from Celery or other alternatives as they found Rocketry more suitable for their needs.

And that's true: it's 100% Python and basically there is a main loop that checks starting conditions of tasks (and some other things) and if a task's starting condition is reached, the task is run. Tasks can be executed synchronously by setting execution as "main" or concurrently with async, threading or multiprocessing. Maybe in the future with another interpreter as well. The main loop is left running in background.

So in short, it's a Python that's constantly loop running. It sleeps defined amount of time after checking a set of tasks to lower the resource consumption but you can also create a task with execution as "main" and do sophisticated sleep like "sleep more when CPU usage is X%" or estimate the time when the next task should start from the tasks' conditions.

And thanks for the positive comment!

link

pid-1 1370 days ago

Hey cool project, congrats!

How does Rocketry saves execution state? Like, if it crashes and goes back up again, does it know which tasks were executed and which ones were not?

link

Miksus 1370 days ago

Thanks a lot, nice to hear!

The system knows which task ran and when by extedning logging (from standard library). There is a logger called "rocketry.task" that should have a handler which can be read as well: redbird.logging.RepoHandler. An in-memory logger is created if nothing is specified. This handler abstracts simple read and write to a data store which can be an SQL database, in-memory Python list, MongoDB or CSV file.

Seems I forgot to implement a method mentioned in the docs but here's an example to specify a task log repo: https://github.com/Miksus/rocketry/issues/108#issuecomment-1...

The latest success time, starting time etc. are also stored in the tasks themselves and there is some optimization (which can be turned off) to reduce the reads in some cases. In the start-up these attributes are set in each task (if logs found).

link

aidos 1369 days ago

This looks brilliant. I like that it’s kept light as a concept - feels like you can just sprinkle it over your existing tasks without getting bogged down in complex configuration.

We have a couple of hand rolled variants of this that run into all the issues this solves. Will definitely look at taking this for a spin.

link

dwils098 1370 days ago

I like the idea, but this should be an api that can be accessed from any language.

link

Miksus 1370 days ago

If you want an API (or UI), just clone this and modify it as you need: https://github.com/Miksus/rocketry-with-fastapi. I also wrote an article to Medium how it works with FastAPI: https://itnext.io/scheduler-with-an-api-rocketry-fastapi-a0f...

Rocketry plays quite nicely with FastAPI.

link

pid-1 1370 days ago

I was scratching my head, looking at the docs and asking myself "Ok do I need a database? Is the scheduler separated from workers? Does it have a UI?"

Being just a lib is actually quite refreshing compared to complex behemoths like Airflow. I guess you could just use your favorite service runner (systemd, k8s, nomad, none at all...).

link