Hacker News new | ask | show | jobs
by alan-hn 730 days ago
Whenever I work with large datasets I use a small subset of the overall data to do testing while I build the pipeline, this avoids long run times and allows for quick iteration while I get things set up to run against the full dataset