|
|
|
|
|
by tourist_on_road
1569 days ago
|
|
Transformers gained popularity due to the scalable nature of the architecture and how well it can be parallelized on existing GPU/XLA hardware. Modeling is always conditioned on the hardware available at hand.
Transfomers lack inductive bias which make it generic building blocks unlike CNN/RNN like models and by injecting inductive bias like positional encoding, it can be well translated to various domains. |
|
If not, then to me, it seems like only industries where it's possible to get access to a large amount of representative data (i.e. greater than a million?) benefit from transformers. In industries where there are bottlenecks to data generation, there's a clear benefit in leveraging the inductive bias in other architectures, such as the various ways CNNs have biases towards image recognition.
I'm in an industry (building energy consumption prediction) where we can only generate around 10,000 to 100,000 datapoints (from simulation engines) for DL. Are transformers ever used with that scale of data?