Hacker News new | ask | show | jobs
by underlines 595 days ago
- LLaMA-Omni https://github.com/ictnlp/LLaMA-Omni a speech-language model built on Llama-3.1-8B-Instruct for simultaneous generation of text and speech

- moshi https://github.com/kyutai-labs/moshi speech-text foundation model using Mimi, a SOTA streaming neural audio codec

- Mini-Omni https://github.com/gpt-omni/mini-omni multimodal LLM based on Qwen2 offering speech input and output

- Ichigo https://github.com/homebrewltd/ichigo open research project extending a text-based LLM to have native listening ability, using an early fusion technique