|
|
|
|
|
by cuddlypsycho
2935 days ago
|
|
"...while avoiding the kind of silly errors made by the LSTM based recurrent neural architectures." Only in arXiv you could get away with that kind of language :). Good paper though! Kudos. "Another direction to go from here would be to increase the size of the context window during the data preprocessing stage to feed even more contextual information into the model." Could you comment on how the training time would scale with increasing the size of the context window? Is there a sweet spot? |
|
The memory requirements of DNC is quite high. We used GTX 1060 for training. Increasing the context window anything more than 3 increases the sequence length by a huge amount, causing memory problems. However, we also found that DNC works quite well even on small batch size. We used a batch size of 16 for all our experiments. The training time for a batch size of 16, context window of size 3 and 200k steps is 48h on a GTX 1060 system.