Hacker News new | ask | show | jobs
by parf02 29 days ago
Most people I know underutilize voice mode. Such a game changer for making brain dumps the LLM can just gobble up
4 comments

Still not useful enough for me and I really want this feature!

The problem I encounter is the inability of the LLM to look stuff up and respond to me. "What's that name of that database table?" "What are all the services that call this endpoint?" "Are there any open PRs for this repo right now?"

Once information can flow in both directions not just one it will be a gamechanger for me.

This works today right? What part of this are you missing?
Not as far as I know. Can you walk me through the process?

I have my phone, I'm going for a walk, what app am I opening?

For Codex, that is ChatGPT? https://openai.com/index/work-with-codex-from-anywhere/

Or do you want it to speak to you too? I think this would have to be TTS on your phone. You can have ChatGPT speak to you but I don't see that feature in Codex.

Sure I speak to ChatGPT all the time and I've used the feature you've linked but it can't do the things I described. It won't be like, "hey let me go look into that" and then come back in 3 minutes with an answer. It's essentially a dictation feature.
I am lost. Codex can't look up stuff for you in your codebase? GitHub Copilot can't look up PRs for you?
I hear a lot of praises. Can you explain? If I can type as fast as I speak, does that still matter in such case? Does it make use of intonation or is it pure speech to text?
For me it's because I've been walking 10-20k steps per day on a walking pad at my standing desk while working.

I can still type while doing it but it's so nice to just hold a button (or double tap to lock recording) and just brain dump what I need.

I type very fast but I feel like the unaltered speech to text allows me to express more quickly any uncertainty or doubt that I have or when I don't know exactly how to best ask for what I want.

I also like to play dumb with agents and ask it for the best option for something that I already know the answer to, just to see if it's gonna be correct, it helps get a sense of the limits of each model. That's also quicker on voice.

I don't think it makes use of intonation for the built in one, I use Hex on mac for speech to text

By this pretty cool dude -> https://github.com/kitlangton/Hex

Why would I want OpenAI to gobble up my brain dumps?
These cloud LLMs are not the tool for you then I suppose. There are local models too, unless your point is why use LLMs, in which case, you don't need to.
No I do use them, for specific, targetted jobs, with 'well' defined specs.
I don’t like that they don’t have realtime output. Would prefer Parakeet for that.