Hacker News new | ask | show | jobs
by jmakov 621 days ago
10GB, 1TB, 100TB? Memory mapping or does it need to fit into memory (RAM, VRAM?)? Is streaming supported - can I point to a 100TB dataset and cruise through it? 1 parquet file or parquet dataset? What about Delta lake? Are outliers drawn or are you doing some sort of sampling/smoothing? Also would be great to have some comparison to similar tools in this space e.g. https://github.com/finos/perspective and HvPlot+Datashader.
1 comments

Data needs to fit in RAM and graphics in VRAM. Let's say 100GB or more if you filter some rows during import. Data is ingested in a in-house database designed to refresh the ever changing selected rows as quickly as possible to conduct a true investigation. You can load as many parquet files as you want in one go provided they have the same structure. Any outlier in any visual representation will be drawn as this is a requirement to detect weak signals and anomalies

Comparisons with the tools you mentioned would indeed be interesting, writing a blog post would be a good idea I guess! I wrote a comparison with ELK here : https://squey.org/domains/cybersecurity/pentesteracademy-mac...