Hacker News new | ask | show | jobs
by Lerc 108 days ago
There was a paper that proposed a content based hashing mask for traning

The idea is you have some window size, maybe 32 tokens. Hash it into a seed for a pseudo random number generator. Generate random numbers in the range 0..1 for each token in the window. Compare this number against a threshold. Don't count the loss for any tokens with a rng value higher than the threshold.

It learns well enough because you get the gist of reading the meaning of something when the occasional word is missing, especially if you are learning the same thing expressed many ways.

It can't learn verbatim however. Anything that it fills in will be semantically similar, but different enough to get cause any direct quoting onto another path after just a few words.

2 comments

> you get the gist of reading the meaning of something when the occasional word is missing,

I think it's more subtle than that. IIUC the tokens were all present for the purpose of computing the output and the score is based on the output. It's only the weight update where some of the tokens get ignored. So the learning is lossy but the inference driving the learning is not.

Rather than a book that's missing words it's more like a person with a minor learning disability that prevents him from recalling anything perfectly.

However it occurs to me that data augmentation could easily break the scheme if care isn't taken.

Yeah, it's a bit hard to describe what it happening, because the process doesn't really have a human analogue.

People have a difficult enough time dealing with how loss reduction learning is or isn't 'seeing' the data. Selectively removing things from the loss while sill feeding it all the data takes the non-intuitive situation one layer deeper.

That's partially why I described the hash & masking process. I understand it from a formulaic approach but I don't really feel like I have have a good handle of what is happening semantically. It's like thinking in 5D, you can do the calculations but it still feels like your brain is not equipped to deal with what it means.

Thanks! Appreciate the response and will look into this