|
|
|
|
|
by thesz
244 days ago
|
|
Can it be that transformer-based solutions come from the well-funded organizations that can spend vast amount of money on training expensive (O(n^3)) models? Are there any papers that compare predictive power against compute needed? |
|
In many cases, I can't even see how many GPU hours or what size cluster of what GPU's the pretraining required. If I can't afford it, then it doesn't matter what it achieved. What I can afford is what I have to choose from.