| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by felipe_aramburu 2678 days ago

In lots of ways. AresDB has ingestion. Users take their data and say here make a table from this, persist that data the way I want you to great now I can query it and go along my merry business. I see lots of nice fancy optimizations in there. The kinds of things a small company like Blazing can only marvel at!

I also see a completely different use case. We don't want to ingest data into Blazing. That is NOT our cup of tea. We will make tools to help you persist data quickly. Working on it. But we want to read data the way YOU store it without asking you to duplicate it into our system and requiring significant effort from the user.

You have files. Great we can read them, query them, do all kinds of acrobatics on them and don't need you to store redundant copies of what you already had.

The files are stored in S3 or HDFS and other file systems. Great, we have built connections to interface with those file systems and you can query directly from them.

A user should be able to get blazing's docker container. Register a filesystem with the engine, and start querying their files within minutes of having launched the container. When we don't do this we fail. It does not seem to me that AresDB is trying to do this but I could be completely wrong and if you think so please tell me why!

BlazingDB is meant to be connected to other tools in the rapids eco system. You interact with it through python. You can 0 copy IPC the data to any other tool in the rapids eco system. Want to a power a dashboard, sure, but this isn't the use case that we are optimizing for. We don't have a JDBC connection. We don't connect to tableau.

Blazing Focuses on Interpreting rather than JITing. Instead of focusing on trying to make the most performant compiled code possible by JITing exactly what we want we pay a process cost that reduces I/O. We convert Logical Plans into physical execution plans that are suitable for a SIMD architecture whose optimization centers around minimizing materializations and global memory accesses. We tried JIT but couldn't find a way to compile plans in less than a few hundred nano seconds at which point the overhead of compiling superseded that of interpreting operations in a kernel.