Hacker News new | ask | show | jobs
by smoldesu 1224 days ago
I suspect something similar is possible with ChatGPT. Using the GPT-neo-125m model I've been able to get some really convincing (if lackluster) answers on 4 core ARM hardware and less than 2gb of memory. With enough sampling, you can get legible paragraph-length responses out in less than 10 seconds; that's pretty good for an offline program in my book.

I'm using rust-bert to serve it over a Discord bot, similar to one of their examples[0]. It's running on Oracle VCPUs right now, but with dedi hardware and ML acceleration I bet it would scream!

[0] https://github.com/guillaume-be/rust-bert/blob/master/exampl...

1 comments

Yes, this could serve as the conduit from the Android phone voice input to a server-based ChatGPT (using the free Konele Android app as frontend).
I just clicked through and noticed the client-server part. I'd be curious to see if a smaller Whisper model could run on an Android phone too... All the same, nicely done!
As mentioned in the git README (at the bottom) there is at least one Whisper port that runs natively on Android. It does not run as fast on an Android phone as on iPhone (because of whisper.cpp optimizations for Apple silicon) but it still runs pretty well. In my tests, it does not run as fast as sending the raw audio across the network to a fast server for transcript there, which is what this post is about. But give it a try.