Hacker News new | ask | show | jobs
by ActorNightly 395 days ago
You are right with respect to ordering of operations, where recurrent networks have a whole bunch of other computational complexity to them.

However, for example, a Transformer can be represented with just deeply connected layers, albeit with a lot of zeros for weights.