Hacker News new | ask | show | jobs
by sjm217 15 days ago
The dataset was 136GB (about 7GB per annum), and the Python implementation took 45 hours for each run. The Julia code that processed the whole dataset and built the database took 5 hours, which made iterative development much more pleasant. Of course, later stages in the pipeline had much less data to process and so were much faster. With metadata and indices, that was about 3GB. It's bigger than your estimate since there are multiple observations of the same satellite.