Sorry I don't know what to answer. I don't think what I do qualifies as "workload".
I have a process that generates lots of data. I put it in a huge multi-indexed dataframe that luckily fits in RAM. I then slice out the part I need and pass it on to some computation (at which point the data usually becomes a numpy array or a torch tensor). Core-count is not really a concern as there's not much going on other than slicing in memory.
The main gain I get of this approach is prototyping velocity and flexibility. Certainly sub-optimal in terms of performance.
I have a process that generates lots of data. I put it in a huge multi-indexed dataframe that luckily fits in RAM. I then slice out the part I need and pass it on to some computation (at which point the data usually becomes a numpy array or a torch tensor). Core-count is not really a concern as there's not much going on other than slicing in memory.
The main gain I get of this approach is prototyping velocity and flexibility. Certainly sub-optimal in terms of performance.