| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by v7n 598 days ago
	It's not exactly what OP wants out-of-the-box, but if anyone is considering building one I suggest taking a look at this.¹ It is really easy to tinker with, can run both on devide or in a client-server model. It has the required speech-to-text and text-to-speech endpoints, with multiple options for each built-in. If you can make the LLM AI assistant part of the pipeline to perform translation to a degree you're comfortable with, this could be a solution. ¹ https://github.com/huggingface/speech-to-speech

1 comments

dmezzetti 598 days ago

A similar option exists with txtai (https://github.com/neuml/txtai).

https://neuml.hashnode.dev/speech-to-speech-rag

https://www.youtube.com/watch?v=tH8QWwkVMKA

One would just need to remove the RAG piece and use a Translation pipeline (https://neuml.github.io/txtai/pipeline/text/translation/). They'd also need to use a Korean TTS model.

Both this and the Hugging Face speech-to-speech projects are Python though.

link

authorfly 598 days ago

Your library is quite possibly the best example of effortful, understandable and useful work I have ever seen - principally evidenced by how you keep evolving with the times. I've seen you keep it up to date and even on the edge now for years and through multiple NLP mini-revolutions (sentence embeddings/new uses) and what must have been the annoying release of LLMs and still push on to have an explainable and useful library.

Code from txtai just feels like exactly the right way to express what I am usually trying to do in NLP.

My highest commendations. If you ever have time, please share your experience/what lead to you taking this path with txtai. For example I see you started in earnest around August 2020 (maybe before) - at that time i would love to know if you imagined LLMs coming on to be as prominent as they are now and for instruction-tuning to work as well as it is. I know at that time many PhD students I knew in NLP (and profs) felt LLMs were far too unreliable and would not reach e.g. consistent scores on MMLU/HELLASWAG.

link

dmezzetti 598 days ago

I really appreciate that! Thank you.

It's been quite a ride from 2020. When I started txtai, the first use case was RAG in a way. Except instead of an LLM, it used an extractive QA model. But it was really the same idea, get a relevant context then find the useful information in it. LLMs just made it much more "creative".

Right before ChatGPT, I was working on semantic graphs. That took the wind out of the sails on that for a while until GraphRAG came along. Definitely was a detour adding the LLM framework into txtai during 2023.

The next release will be a major release (8.0) with agent support (https://github.com/neuml/txtai/issues/804). I've been hesitant to buy into the "agentic" hype as it seems quite convoluted and complicated at this point. But I believe there are some wins available.

In 2024, it's hard to get noticed. There are tons of RAG and Agent frameworks. Sometimes you see something trend and surge past txtai in terms of stars in a matter of days. txtai has 10% of the stars of LangChain but I feel it competes with it quite well.

Nonetheless I keep chugging along because I believe in the project and that it can solve real-world use cases better than many other options.

link

okwhateverdude 597 days ago

I have a dozen or so tabs open at the moment to wrap my head around txtai and its very broad feature set. The plethora of examples is nice even if the python idioms are dense. The semantic graph bits are of keen interest for my use case, as are the pipelines and workflows. I really appreciate you continuing to hack on this.

link

dmezzetti 597 days ago

You got it. Hopefully the project continues it's slow growth trajectory.

link