| Fair benchmarks would justify merging aiopandas into pandas. Benchmark grid axes: aiopandas, dtype_backend="pyarrow", dask-cudf pandas pyarrow docs:
https://pandas.pydata.org/docs/dev/user_guide/pyarrow.html /? async pyarrow:
https://www.google.com/search?q=async+pyarrow /? repo:apache/arrow async language:Python :
https://github.com/search?q=repo%3Aapache%2Farrow+async+lang... : test_flight_async.py
https://github.com/apache/arrow/blob/main/python/pyarrow/tes... pyarrow/src/arrow/python/async.h:
https://github.com/apache/arrow/blob/main/python/pyarrow/src... : "Bind a Python callback to an arrow::Future." -- dask-cudf:
https://docs.rapids.ai/api/dask-cudf/stable/ : > Neither Dask cuDF nor Dask DataFrame provide support for multi-GPU or multi-node execution on their own. You must also deploy a dask.distributed cluster to leverage multiple GPUs. We strongly recommend using Dask-CUDA to simplify the setup of the cluster, taking advantage of all features of the GPU and networking hardware. cudf.pandas > FAQ > "When should I use cudf.pandas vs using the cuDF library directly?" https://docs.rapids.ai/api/cudf/stable/cudf_pandas/faq/#when... : > cuDF implements a subset of the pandas API, while cudf.pandas will fall back automatically to pandas as needed. > Can I use cudf.pandas with Dask or PySpark? > [Not at this time, though you can change the dask df to e.g. cudf, which does not implement the full pandas dataframe API] -- dask.distributed docs > Asynchronous Operation; re Tornado or asyncio:
https://distributed.dask.org/en/latest/asynchronous.html#asy... -- tqdm.dask, tqdm.notebook: https://github.com/tqdm/tqdm#ipythonjupyter-integration from tqdm.notebook import trange, tqdm
for n in trange(10):
time.sleep(1)
--But then TPUs instead of or in addition to async GPUs; TensorFlow TPU docs: https://www.tensorflow.org/guide/tpu |