LLaSA is a simple framework for speech synthesis that employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align with standard LLMs such as LLaMA.
Probably the title should have the correct capitalization then. Cause I was fully expecting a speech synthesis tool that sounded like llamas talking human language and now I'm bummed out!