Hacker News new | ask | show | jobs
by dr_kiszonka 1872 days ago
I usually use Python and R for analysis. However, when dealing with larger datasets, e.g., 0.5 - 2 PB, I have to rely on SQL/BigQuery because I can't get Python and R to deal such workloads in reasonable time. I tried Dask, but I couldn't resolve a few bugs it had at the time.

If you were to find outliers in a 1 PB table, what tools would you use?