Hacker News new | ask | show | jobs
by biomodel 2730 days ago
Not sure why anyone would use 2D CNNs for processing text when there is no spatial correlation in the embedding features. Recent work such as https://arxiv.org/abs/1803.01271 show that for most tasks, 1D CNNs outperform recurrent architectures while being faster to train
2 comments

Probably because the author followed this blog: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-cl...

That blog used a 2d cnn because tensorflow didn't have a 1d version at the time of writing, so he just created a dummy 2nd dimension of length 1 and called it a day.

This is just a bug in their code. The paper they cite uses 1D convolutions. Though, I suppose having an unused dimension only really hurts efficiency.
> Though, I suppose having an unused dimension only really hurts efficiency.

That might not be true as it might increase bias and thus might need a more careful hyperparameter tuning to avoid overfitting.