Hacker News new | ask | show | jobs
by biosboiii 534 days ago
Your second note is very interesting, having looked at the model myself this is very plausible.

For models which use a lot of input nodes, a lot of "hidden layers" and in the end just perform a softmax this may get infeasible because of the amount of data you would have to transfer.

You may have inspired a second article :)