Hacker News new | ask | show | jobs
by anonymousDan 1116 days ago
Can anyone explain the architecture/how it works at a high level? I get that it is distributed. Does it basically copy the complete source tree to every worker and have them compile some independent subset of the object files? Does performance scale linearly with the number of worker nodes?
4 comments

It runs the preprocessor locally, then sends that out to a volunteer node. https://www.distcc.org/distcc-lca-2004.html:

“The client is invoked as a wrapper around the compiler by Make. Because distcc is invoked in place of gcc, it needs to understand every pattern of command line invocations. If the arguments are such that the compilation can be run remotely, distcc forms two new sets of arguments, one to run the preprocessor locally, and one to run the. compiler remotely. If the arguments are not understood by distcc, it takes the safe default of running the command locally. Options that read or write additional local files such assembly listings or profiler tables are run locally”

Scalability:

“Reports from users indicate, distcc is nearly linearly scalable for small numbers of CPUs. Compiling across three identical machines is typically 2.5 to 2.8 times faster than local compilation. Builds across sixteen machines have been reported at over ten times faster than a local builds. These numbers include the overhead of distcc and Make, and the time for non-parallel or non-distributed tasks”

Last time I looked, it basically ran per-file "cc -E" on the source machine to get a compilation unit (optionally checking for a ccache cache hit at this point), then piped the result to "cc" running on the target machine, and copied the resulting object file back.

> Does performance scale linearly with the number of worker nodes

Yes, for small N.

Overall scaling was limited by how much "make -j" the source machine could cope with.

This was the original approach, although later work added an optional "pump mode" in which headers are distributed: https://manpages.ubuntu.com/manpages/bionic/man1/distcc-pump...
Per the description:

> distcc sends the complete preprocessed source code across the network for each job, so all it requires of the volunteer machines is that they be running the distccd daemon, and that they have an appropriate compiler installed.

So all the "environment" is on the source machine and just a bare compiler is required on the remote machines for compilation.

from section 3 (Design) of https://www.distcc.org/distcc-lca-2004.html

'''

distcc distributes work from a client machine to any number of volunteer machines. (Since this is free software, we prefer the term volunteer to the slaves that are used by other systems.)

distcc consists of a client program and a server. The client analyzes the command run, and for jobs that can be distributed it chooses a host, runs the preprocessor, sends the request across the network and reports the results. The server accepts and handles requests containing command lines and source code and responds with object code or error messages.

'''