|
|
|
|
|
by felipe_aramburu
2507 days ago
|
|
This is a Distributed SQL engine not a database. We store no data. You store your data in HDFS, S3, posix, NFS etc. We allow you to query directly from these filesystems of the file formats you have already. You can look here to see the file formats cudf supports. https://github.com/rapidsai/cudf/tree/branch-0.9/cpp/src/io You can try it out yourself here https://colab.research.google.com/drive/1r7S15Ie33yRw8cmET7_... Or use dockerhub
https://hub.docker.com/r/blazingdb/blazingsql/ The benefits are. Greatly increased processing capacities. We can just perform orders of magnitudes more instructions per second than a cpu with the gpus we are using. Decompression and parsing of formats like CSV and parquet happens in the GPU orders of magnitude faster than the best cpu alternatives. You can take the output of your queries and provide it to machine learning jobs with zero copy ipc and get the results back the same way. We are all about interoperability with the rapidsai eco system. |
|
// sorry if this is a stupid question.