|
|
|
|
|
by angusturner
405 days ago
|
|
Hm suppose for argument sake that feeding a batch of data through some moderately large FF architectures takes on the order of 100ms (I realise this depends on a lot parameters - but this seems reasonable for many tasks / networks). Now suppose instead you have an CTM that allocates 10ms on the standard FF axes, and then multiplies it out by 10 internal “ticks” / recurrent steps? The exact numbers are contrived, but my point is : couldn’t we conceivably search over that second arch just as easily? It just boils down to whether the inductive bias of building in some explicit time axis is actually worthwhile, right ? |
|