Hacker News new | ask | show | jobs
by wegfawefgawefg 890 days ago
Sorry to upset you. It was not clear from your description that this was the process you were referring to. Others will read what you wrote and likely misunderstand as I did. (Which was my concern because I've seen the "mixture of idiots" architecture attempted since 2015. Even now... Its a common misconception and an argument every ml practitioner has at one point or another with a higher up.)

As for your ammendment, it is good to reduce compute when you can, and reduce up front effort for model creation when you can. Reusing models may be valid, but even in your ammended process you will still end up not reaching the peak performance of a single end to end model trained on the right data. Composite models are simply worse, even when transfer learning is done correctly.

As for the compute cost, if you train an end to end model and then minify it to the same size as the sum of your composite models it will have identical inference cost, but higher peak accuracy.

You could even do that with the "Shared Backbone" architecture, as youve described where two tailnetworks share a head network. It has been attempted thoroughly in the Deep Reinforcement Learning subdomain I am most familiar, and result in unnecessary performance loss. So it's not generally done anymore.

1 comments

Man, everyone at work is going to be really bummed when I tell them that some guy on the internet has invalidated our empirical evidence of acceptable accuracy and performance with assumptions and appeals to authority.
I did not say it would not work, nor that it couldnt be acceptable performance for a given task.

Just that its peak performance is lower than an end to end model, and that if youre going to encourage model kit-bashing be clear how you communicate it, so people dont make human centipede architectures and wonder why feces is what comes out the end.

I was a very polite enough "some guy on the internet". Thank you.