Have you considered using jax? you can efficiently compute the jacobian (even using the gpu if you want) without leaving python at all. The api is also numpy compatible!
We target very low-level & specific hardware. I don't think it would be easy to deploy Jax on it. But it's an interesting idea, maybe for robots running linux !