Hacker News new | ask | show | jobs
by pmalynin 3348 days ago
No hard numbers to present, but it would be beneficial in long-sequence LSTM networks, because TensorFlow has to do time-major <=> batch-major transposition between steps.