| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amitport 1180 days ago
	hmmm... seems like you're reinventing distributed learning. merge-ability does exist and you can average the results.

2 comments

mirekrusin 1180 days ago

You can if you have same base weights.

If you have similar variants of the same task you can accelerate it more where the diff is.

You can't average on past results computed from historic base weights - it's linear process.

If you could do that, you'd just map training examples to diffs and merge them all.

Or take two distinct models and merge them to have model that is roughly sum of them. You can't do it, it's not linear process.

link

mirekrusin 1180 days ago

I did some bad use of words there "it's linear process" + "it's not linear process" :)

Let me clarify:

It's serialised, iterative, step repeating process where each step depends on output of previous one - aka linear process.

Where each step is non-linear transformation (gradient descent).

It's not distributable (over internet) task because it'd require transferring gigabytes of data (whole model weights) on each step.

To put it in other words - distributed task has massive input size and requires quick computation and tasks arrive very frequently - which means it can't be distributed over internet.

link

PeterisP 1180 days ago

Distributed learning sucks for this type of models, averaging the results helps if you can do that often which requires very high bandwidth - i.e. the Infiniband interconnects between Nvidia pods which go up to 200 Gbps.

link