| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blop 362 days ago

I found this pdf presentation with lots of great technical details about data management and a devops infra oriented view of this telescope: https://ci-compass.org/assets/602137/2025jan23_cicompass_rub...

Worth a read for the devops guys around here!

  - about 20TB per day, around 100PB expected for the whole survey
  - 0.5PB ceph cluster for local data
  - workloads on 20 nodes kubernetes cluster/argocd
  - physical infra managed with puppet/ansible
  - 100Gbs(+40Gs backup) fiber connection to US-based datacenter for further processing

3 comments

newpavlov 361 days ago

I wonder if they could reduce the data size at rest by using specialized compressing techniques. Your probably could build an averaged "model" of the sky observed by the telescope (probably with account for stellar parallax and bright planets) and store only compressed diffs, not full images.

But I guess, since storage is relatively cheap, it's simply impractical to bother with such complexity.

link

xhkkffbf 361 days ago

There's quite a bit of black out there. That should compress easily.

link

newpavlov 361 days ago

The usual lossless image compression algorithms is the given. I am talking about compressing it further since the telescope observes the same (or largely overlapping) patches of the sky and the most significant signal is stars, which are more or less "constant". At the very least, they probably could use the lossless "animation" compression algorithms like APNG or FLIF for consequent images of the same sky patch.

link

aragilar 361 days ago

Look up fpack and funpack.

link

blop 361 days ago

actually the telescope devops guys were hiring a couple years ago on HN: https://news.ycombinator.com/item?id=38101085 :-D

link

Melatonic 362 days ago

Insanity - love it

link

cycomanic 362 days ago

If you think this is insanity I encourage you to look up the expected data to come out of the SKA. Even after several processing steps they expect several hundred PB/year (the raw data which is not being archived is several orders of magnitude more). That is only SKA-low I think for SKA-mid we are talking Exabyte/year. I recall that their chief scientist said once they are operational they will process more data than google and facebook combined.

link

cb321 362 days ago

Yeppers: https://en.wikipedia.org/wiki/Square_Kilometre_Array

In-page search for "data challenges". Pfew, that's a lot of data.

link