Hacker News new | ask | show | jobs
by boroboro4 434 days ago
Chosen expert (on each layer) depends on the input of previous layer. Not sure how you can preload the experts before forward pass.