|
|
|
|
|
by enjeyw
934 days ago
|
|
If you’re willing, I’d love your insight on the “why one might want to do this”. Conceptually I understand embedding quantization, and I have some hint of why it works for things like WAV2VEC - human phonemes are (somewhat) finite so forcing the representation to be finite makes sense - but I feel like there’s a level of detail that I’m missing regarding whats really going on and when quantisation helps/harms that I haven’t been able to gleam from papers. |
|
But really it's only really useful if you absolutely need to have a discrete embedding space for some sort of downstream usage. VQVAEs can be difficult to get to converge, they have problems stemming from the approximation of the gradient like codebook collapse