|
|
|
|
|
by joaohaas
234 days ago
|
|
So, if I got this right, this is just about re-implementing an existing load balancing algorithm faster...? If so, this is really dumb. As you guys checked out, yes most load balancing algorithms are slow/dumb: >First, we evaluate DeepSeek's open-source EPLB implementation. This employs a greedy bin-packing strategy: experts are sorted by load in descending order, and each is placed onto the least-loaded GPU that has capacity (Figure 3a, Example 1). While simple, the solution is slow because it written in Python and uses a for-loop to performs linear search for finding the best-fit GPU choice. This is because when considering a load balancing algorithm, unless the work being done (in this case by the GPU) lasts only a few ms, the load balancing algorithm being fast will never be the bottleneck. The post does not mention whether this is the case at all. Also, I don't want to sound rude, but if all they managed to get is a 5x increase over a simple python algorithm, I don't think this is impressive at all...? Any rewrite of the 'dumb' algorithm in a language with more memory control and cache continuity should result in much better results. |
|