Hacker News new | ask | show | jobs
by slizard 2285 days ago
We're talking about Folding@Home not any (or all) distributed computing project(s). I've not seen recent stats on the size of the FAH network, so not sure how does it compare to large supercomputer resources, but plots available through an easy google search show that in the past it was never even larger than the biggest machines on the TOP500, let alone multi-oom larger.

(Also, AFAIK garbage flops also count. As far as I recall talkin to a researcher working with FAH a qhile ago, a large-ish fraction of results returned were not usable due to data/files being broken.)

1 comments

> We’re talking about Folding@Home not any (or all) distributed computing project(s).

Oh, that wasn’t really clear to me. Anyway, there is a close relationship between the SETI project and the Folding project, so this doesn’t seem like a weird stretch to me to compare them. Whatever SETI@home has achieved is evidence for what Folding@home might achieve.

> plots available through an easy google search show that in the past it was never even larger than the biggest machines on the TOP500

But that has little bearing on either what’s possible in the future, nor why they aren’t scheduling compute time on Summit, right?

Maybe the best answer to your original question is posted in the FAQ section on folding at home.org: see “Why not just use a supercomputer” https://foldingathome.org/support/faq/project-details/

> Also AFAIK garbage flops also count.

That’s always true in these peak performance measurements, both for distributed projects and for supercomputers too. Also true for CPU & GPU peak flops specs.

We can’t actually know how much ‘useful’ work is being done in any case, that depends on all kinds of things like what kind of problem is being solved, how data-parallel the problem is, how well the problem is even understood, what algorithms are being used, what bottlenecks there are on data & IO, whether the implementers used python or CUDA.

I just don’t see any reason why Folding@Home would or should refrain from distributed crowd computing just because supercomputers exist. They can both happen. We don’t need to try to come up with efficiency numbers or compare utility/flop in order to see that Folding@Home is producing some useful research results, right?

Maybe it’s worth pointing out that compute time on TOP500 supercomputers is not free, and Folding@Home is not a job you can run for 1 day or 1 week and be done. Nvidia or IBM could donate some time, but it won’t finish the project, so it might make no sense to donate supercomputer time, just like it makes no sense for Folding@Home to seek out or purchase supercomputer time.

> But that has little bearing on either what’s possible in the future,

There is certainly a chance that 10-100x more flops will appear on the fah network, but as I said I just don't believe it will happen.

> nor why they aren’t scheduling compute time on Summit, right?

They are. The researchers who get to use the FAH network certainly do have access to traditional supercomputing resources too. Some types of problems require strong scaling (and reliable resources) which requires HPC iron.

> > Also AFAIK garbage flops also count.

> That’s always true in these peak performance measurements, both for distributed projects and for supercomputers too. Also true for CPU & GPU peak flops specs.

I think you're misunderstanding me: I literally meant that the data files returned by the FAH contributors may often be garbage due to data corruption (OC'd cards, no ECC, overheating hardware, poor storage etc.). Not sure to what extent is this still the case today.

> I just don’t see any reason why Folding@Home would or should refrain from distributed crowd computing just because supercomputers exist. They can both happen. We don’t need to try to come up with efficiency numbers or compare utility/flop in order to see that Folding@Home is producing some useful research results, right?

In principle you're right. There are some questions around it, though. Briefly: access is a privilege of few; oversight?; inefficient use of hardware (low Flops/W), just to name a few.

> Maybe it’s worth pointing out that compute time on TOP500 supercomputers is not free,

No, it is not, but access is granted by grant agencies with at least some transparency and oversight as well as scientific review; also, those machine are far more efficient (flops/w).

> and Folding@Home is not a job you can run for 1 day or 1 week and be done.

No, Folding@Home is not the "job"; a "job" at least in HPC sense is a molecular dynamics simulation (or many); a set of such jobs is what is typically required for a project to be completed result of which would end up in a publication. The computational work corresponding to such a project/paper, depending on the size of the machine, can in fact be run in 1 week, and on a large machine even in 1 day.