Hacker News new | ask | show | jobs
by slizard 2284 days ago
> But Folding@Home is setup for home,

Nothing prevents running the fah client on nodes of a compute cluster -- in fact my colleagues did that (while running a local F@H server), though that was a number of years ago just because they wanted take advantage of the distributed computing facilities provided by the client-server setup and built-in algorithms.

> crowd sourcing has the potential to scale to many orders of magnitude larger than what can be done in a data center.

Potential it does have, but I am skeptical of the "many orders of magnitude" claim ever having a chance to materialize. I'd love to see a cost / benefit analysis on the effective amount of useful work contributed vs the cost of the same in a data center.

1 comments

> I am skeptical of the "many orders of magnitude" claim ever having a chance to materialize.

The many orders of magnitude has already materialized https://en.wikipedia.org/wiki/SETI@home#Statistics

“On September 26, 2001, SETI@home had performed a total of 1021 floating point operations. It was acknowledged by the 2008 edition of the Guinness World Records as the largest computation in history.[22] With over 145,000 active computers in the system (1.4 million total) in 233 countries, as of 23 June 2013, SETI@home had the ability to compute over 668 teraFLOPS.[23] For comparison, the Tianhe-2 computer, which as of 23 June 2013 was the world's fastest supercomputer, was able to compute 33.86 petaFLOPS (approximately 50 times greater).”

"On September 26, 2001, SETI@home had performed a total of 1021 floating point operations."

Just to clarify, I think HN formatting ate a caret here (now dang can see in the dark!) and it's supposed to be "10 to the 21"; either that or floating point math is much harder than I remember.

No caret was in the original so none was eaten - it's just a superscript.
Indeed, that would mean a staggering 1.618×10⁻⁵ FLOPS.
This is 2013 with most of those computers likely even older than 2013 and likely single core vs 3,120,000 brand new cores on the supercomputer. So, in terms of “raw” flops it’s just a question of more and newer hardware on the supercomputer not really better architecture.
Good point! I guess handwavy-napkin-math that looks like roughly ~2x more flops/core for the Tianhe-2, which is unsurprising for the latest supercomputer vs rando home computer. Maybe even surprising the number isn’t higher...

It’s probably worth noting that Seti@home has partitioned the problem space so that it’s “embarrassingly parallel”, as they say. I think raw flops are a good metric in this case, where raw flops is a very bad metric for some problems.

We're talking about Folding@Home not any (or all) distributed computing project(s). I've not seen recent stats on the size of the FAH network, so not sure how does it compare to large supercomputer resources, but plots available through an easy google search show that in the past it was never even larger than the biggest machines on the TOP500, let alone multi-oom larger.

(Also, AFAIK garbage flops also count. As far as I recall talkin to a researcher working with FAH a qhile ago, a large-ish fraction of results returned were not usable due to data/files being broken.)

> We’re talking about Folding@Home not any (or all) distributed computing project(s).

Oh, that wasn’t really clear to me. Anyway, there is a close relationship between the SETI project and the Folding project, so this doesn’t seem like a weird stretch to me to compare them. Whatever SETI@home has achieved is evidence for what Folding@home might achieve.

> plots available through an easy google search show that in the past it was never even larger than the biggest machines on the TOP500

But that has little bearing on either what’s possible in the future, nor why they aren’t scheduling compute time on Summit, right?

Maybe the best answer to your original question is posted in the FAQ section on folding at home.org: see “Why not just use a supercomputer” https://foldingathome.org/support/faq/project-details/

> Also AFAIK garbage flops also count.

That’s always true in these peak performance measurements, both for distributed projects and for supercomputers too. Also true for CPU & GPU peak flops specs.

We can’t actually know how much ‘useful’ work is being done in any case, that depends on all kinds of things like what kind of problem is being solved, how data-parallel the problem is, how well the problem is even understood, what algorithms are being used, what bottlenecks there are on data & IO, whether the implementers used python or CUDA.

I just don’t see any reason why Folding@Home would or should refrain from distributed crowd computing just because supercomputers exist. They can both happen. We don’t need to try to come up with efficiency numbers or compare utility/flop in order to see that Folding@Home is producing some useful research results, right?

Maybe it’s worth pointing out that compute time on TOP500 supercomputers is not free, and Folding@Home is not a job you can run for 1 day or 1 week and be done. Nvidia or IBM could donate some time, but it won’t finish the project, so it might make no sense to donate supercomputer time, just like it makes no sense for Folding@Home to seek out or purchase supercomputer time.

> But that has little bearing on either what’s possible in the future,

There is certainly a chance that 10-100x more flops will appear on the fah network, but as I said I just don't believe it will happen.

> nor why they aren’t scheduling compute time on Summit, right?

They are. The researchers who get to use the FAH network certainly do have access to traditional supercomputing resources too. Some types of problems require strong scaling (and reliable resources) which requires HPC iron.

> > Also AFAIK garbage flops also count.

> That’s always true in these peak performance measurements, both for distributed projects and for supercomputers too. Also true for CPU & GPU peak flops specs.

I think you're misunderstanding me: I literally meant that the data files returned by the FAH contributors may often be garbage due to data corruption (OC'd cards, no ECC, overheating hardware, poor storage etc.). Not sure to what extent is this still the case today.

> I just don’t see any reason why Folding@Home would or should refrain from distributed crowd computing just because supercomputers exist. They can both happen. We don’t need to try to come up with efficiency numbers or compare utility/flop in order to see that Folding@Home is producing some useful research results, right?

In principle you're right. There are some questions around it, though. Briefly: access is a privilege of few; oversight?; inefficient use of hardware (low Flops/W), just to name a few.

> Maybe it’s worth pointing out that compute time on TOP500 supercomputers is not free,

No, it is not, but access is granted by grant agencies with at least some transparency and oversight as well as scientific review; also, those machine are far more efficient (flops/w).

> and Folding@Home is not a job you can run for 1 day or 1 week and be done.

No, Folding@Home is not the "job"; a "job" at least in HPC sense is a molecular dynamics simulation (or many); a set of such jobs is what is typically required for a project to be completed result of which would end up in a publication. The computational work corresponding to such a project/paper, depending on the size of the machine, can in fact be run in 1 week, and on a large machine even in 1 day.

are they comparing SETI@home in 2001 vs supercomputer in 2013? Looks unfair
On a second thought, projects like the Exscalate4CoV [1] might have a chance to contribute even short-term as they do have the explciit goal to also focus on "Identify virtually and quickly the drugs available, or at an advanced stage of development, potentially effective"

[1] hhttps://www.cineca.it/en/news/exscalate4cov-hpc-platform-win...

No, they aren’t. Read it again.

In 2001 SETI@home had more flops than the best supercomputer of the time. In 2013, SETI@home had fewer flops than the best supercomputer of the time.