|
|
|
|
|
by abmfy
234 days ago
|
|
Thanks for commenting! Actually in this case, "the work being done" can be really fast because it can be done asynchronously. For context, here’s how this translates in a real-world application. The original algorithm was provided by DeepSeek, and our optimized implementation achieves a 92× speedup over it. The 5x number is comparing with another baseline that is undisclosed yet. When integrating EPLB into vLLM, I discovered—somewhat unexpectedly—that the open-source algorithm consumes nearly half of the total time of a rearrangement step, with the remaining time spent transferring weights across GPUs. To address this, I applied OpenEvolve to the algorithm, setting the primary objective to improve speed while maintaining the same balance factor. It performed remarkably well. With additional optimizations on the weight transferring, the overall overhead has now become almost negligible. |
|
e: also comparison a fixed (nothing faster than 0!) and random policy might be informative if your intent is to publish this as improvement for the object problem, not just a demonstration of ARDS.