|
|
|
|
|
by wegfawefgawefg
890 days ago
|
|
Sorry to upset you. It was not clear from your description that this was the process you were referring to. Others will read what you wrote and likely misunderstand as I did. (Which was my concern because I've seen the "mixture of idiots" architecture attempted since 2015. Even now... Its a common misconception and an argument every ml practitioner has at one point or another with a higher up.) As for your ammendment, it is good to reduce compute when you can, and reduce up front effort for model creation when you can. Reusing models may be valid, but even in your ammended process you will still end up not reaching the peak performance of a single end to end model trained on the right data. Composite models are simply worse, even when transfer learning is done correctly. As for the compute cost, if you train an end to end model and then minify it to the same size as the sum of your composite models it will have identical inference cost, but higher peak accuracy. You could even do that with the "Shared Backbone" architecture, as youve described where two tailnetworks share a head network. It has been attempted thoroughly in the Deep Reinforcement Learning subdomain I am most familiar, and result in unnecessary performance loss. So it's not generally done anymore. |
|