Hacker News new | ask | show | jobs
by creichenbach 3005 days ago
Ah, so your first suggestion would be basically building an auto-encoder based on training data with correct and incorrect words (fragments). This might work, but it would require a lot of computation: Each word of the vocabulary multiplied by all its similar counterparts. And this wouldn't cover new/unknown terms yet.

What we ended up doing for now is a two-dimensional input layer with per-column one-hot encoding of characters (i.e. one character is one column, 128 rows for the ascii alphabet). Then, apply a convolution with kernel dimensions 3x128, which flattens data to one dimesion and combines three neighboring characters. The second part builds an "assiciation" between neighbors, which helps yielding similar outputs for similar word fragments.

This works quite well, except for some nasty limitations:

- Search queries have a hard limit in length, caused by our input layer dimensions

- Due to varying search query length, input nodes on the right side are often unused/zero, leading an weighting bias on the left side when training. That is, the start of search queries receives more attention that the end. But that's not necessarily a bad thing.