| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by roaramburu 2102 days ago

Howdy, full disclosure I'm the CEO at BlazingSQL (BSQL).

I'm not incredibly familiar with Ares save the linked article, but we aren't a DBMS or manage data in any way.

BlazingSQL is a SQL engine, it's easier to think of it similar to SparkSQL, Presto, Drill, etc.

We're core contributors to RAPIDS cuDF (CUDA DataFrame), which is a Pyhton and C++ library for Apache Arrow in-GPU memory. The Python library follows a pandas-like API, and the compute kernels are in C/C++.

BSQL binds to the same C++ as the pandas-like cuDF. What this enables users to do is interact with a DataFrame with either SQL or pandas depending on their needs or preferences. This interoperability means that the rest of the RAPIDS stack can be applied to a variety of different use cases (data viz, ML, Graph, Signal Processing, DL, etc), with the same DataFrame.

The DataFrame also has performant libraries for IO, Joins, Aggregations, Math operations, and more.

Here is an example of running a query on ~1TB on a single GPU in under 9 minutes. The data was stored on AWS S3 in Apache Parquet. https://twitter.com/blazingsql/status/1303370102348361729

Here is an example of scaling that same query up to 32 GPUs and running it in 16 seconds. https://twitter.com/blazingsql/status/1304450203030880257

Again, think of BSQL as a query engine, that runs queries on data wherever and however you have it. Here is a BSQL user running 1-2 minute queries on 1.5TB of CSV files using 2 GPUs. https://twitter.com/tomekdrabas/status/1303824164273270789

Let me know if that helps at all (or not).