Hacker News new | ask | show | jobs
by obblekk 990 days ago
RNNs are the correct solution, but infeasibly expensive to run.

A different way to think about it is Transformer models are trying to predict which part of the RNN network is "worth" keeping given a resource constraint.

Transformers use a simple heuristic today (and this result makes the heuristic better). Just like many NP complete problems, there might be approximations that are not perfectly correct but still useful. Transformers prove that is the case for neural networks.