Hacker News new | ask | show | jobs
by blop 362 days ago
I found this pdf presentation with lots of great technical details about data management and a devops infra oriented view of this telescope: https://ci-compass.org/assets/602137/2025jan23_cicompass_rub...

Worth a read for the devops guys around here!

  - about 20TB per day, around 100PB expected for the whole survey
  - 0.5PB ceph cluster for local data
  - workloads on 20 nodes kubernetes cluster/argocd
  - physical infra managed with puppet/ansible
  - 100Gbs(+40Gs backup) fiber connection to US-based datacenter for further processing
3 comments

I wonder if they could reduce the data size at rest by using specialized compressing techniques. Your probably could build an averaged "model" of the sky observed by the telescope (probably with account for stellar parallax and bright planets) and store only compressed diffs, not full images.

But I guess, since storage is relatively cheap, it's simply impractical to bother with such complexity.

There's quite a bit of black out there. That should compress easily.
The usual lossless image compression algorithms is the given. I am talking about compressing it further since the telescope observes the same (or largely overlapping) patches of the sky and the most significant signal is stars, which are more or less "constant". At the very least, they probably could use the lossless "animation" compression algorithms like APNG or FLIF for consequent images of the same sky patch.
Look up fpack and funpack.
actually the telescope devops guys were hiring a couple years ago on HN: https://news.ycombinator.com/item?id=38101085 :-D
Insanity - love it
If you think this is insanity I encourage you to look up the expected data to come out of the SKA. Even after several processing steps they expect several hundred PB/year (the raw data which is not being archived is several orders of magnitude more). That is only SKA-low I think for SKA-mid we are talking Exabyte/year. I recall that their chief scientist said once they are operational they will process more data than google and facebook combined.
Yeppers: https://en.wikipedia.org/wiki/Square_Kilometre_Array

In-page search for "data challenges". Pfew, that's a lot of data.