Hacker News new | ask | show | jobs
by danhon 44 days ago
"Looking at every public Airbnb listing in Inside Airbnb's open data dump, all at once, on Burla"

This Inside Airbnb?

Community Guidelines

Please:

Only take the data you need. Do not scrape data from the site, if you would like to subscribe to the data directly, please email data@insideairbnb.com

1 comments

>Everything was parallelized on Burla, on a single dynamic cluster that scaled to ~1.7K CPU workers for photo download and CLIP, with 20 A100 GPUs running embedding clusters in parallel on the same cluster.

That's a lot of budget - would have been nice if they'd made an actual donation to the project, instead of pounding the project's servers and bandwidth when there are much better ways to interact with the data.

Totally fair callout. I should’ve been more careful here and leaned on the provided datasets / bulk access instead of pulling things at scale. That’s on me.

I’ll make a donation to support the project regardless. Appreciate you raising it.

... so you'd only end up making a donation if you ended up "stressing the project's infra more than expected"?!