| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by huac 856 days ago

> 30b+ parameter model doing RAG as part of a conversation with voice responses in less than a second, running on Nvidia.

I believe that this is doable - my pipeline is generally closer to 400ms without RAG and with Mixtral, with a lot of non-ML hacks to get there. It would also definitely be doable with a joint speech-language model that removes the transcription step.

For these use cases, time to first byte is the most important metric, not total throughput.

1 comments

qeternity 856 days ago

It’s important…if you’re building a chatbot.

The most interesting applications of LLMs are not chatbots.

link

chasd00 856 days ago

> The most interesting applications of LLMs are not chatbots.

What are they then? Every use case I’ve seen is either a chatbot or like a copy editor which is just a long form chatbot.

link

jasonjmcghee 856 days ago

Obviously not op, but these days LLMs can be fuzzy functions with reliably structured output, and are multi-modal.

Think about the implications of that. I bet you can come up with some pretty cool use cases that don't involve you talking to something over chat.

One example:

I think we'll be seeing a lot of "general detectors" soon. Without training or predefined categories, get pinged when (whatever you specify) happens. Whether it's a security camera, web search, event data, etc

link

nycdatasci 856 days ago

Complex data tagging/enrichment tasks.

link

throwaway2037 856 days ago

> The most interesting applications of LLMs are not chatbots.

In your opinion, what are the most interesting?

link