|
|
|
|
|
by felipe_aramburu
2682 days ago
|
|
1. You don't have to fit your whole workload on the GPU you can process it in batches like you would for a workload that doesn't fit into memory on a non gpu solution. You don't need PB of GPU memory to run PB workloads. 2. The dataset is trivially small because this is a new engine built for the rapids eco system and it is limited for the time being to a single node. We are releasing our distributed version for GTC (mid March) and will be able to give you more reasonable comparisons. This is a similar path of development to our pre Rapids engine which went from single node to distributed in about a month because we have built this engine to be distributed. Right now we are finishing up UCX integration which is the layer we will be using to communicate between all the nodes. 3. You can always try it out. Its own dockerhub (see links in this post) and if you want to run distributed workloads right now you can manage that process using dask by handling the splitting up of the job yourself. In a few weeks you will be able to have the job split up for you automatically without need for the user to be aware of the size of the cluster or how to distribute data across it. |
|
With GPUs, the software progression is single gpu -> multi-gpu -> multinode multigpu. By far, the hardest step is single gpu. They're showing that.