| HN Mirror

Yes that Nature paper was exactly it, and thank you for the reference on ANN:SNN mapping. I'm not 100% clear on what you mean by the learning problem. Do you mean like trying to run training on these physical neural nets rather than just inference? Or just trying to train an SNN conventionally, is that difficult for some reason?

And you are so right that it would be ridiculous. It's funny, before I read your comment this morning I was watching a livestream of a comedian talking to an AI-generated character ("Slunt"). I don't know the implementation details, but it would have been something simple like an open source or commercial speech to text program, maybe even Whisper, then the text passed through to the OpenAI API (probably GPT-4) with a prompt wrapper to set up the character and setting, then the response received and generated with ElevenLabs or something like that. I can't link it as it was a livestream, but it was the same sort of thing that was used to make this demo: https://youtu.be/u_Zn89_g7ok. Anyway, the whole time I was watching her talk to this AI character, it was taking quite a while to respond, taking a while to recognise her voice, etc etc, and I was thinking about what would be required for that interface to be truly conversational. It's just speed. If it was running even ten times faster, that would be closer, but a hundred times faster is probably what would be required to have a genuinely conversational interface. What you need is for your voice to be recognised and converted to text basically instantly, then have the LLM go over it and respond basically instantly, then have the TTS program start saying it basically instantly as well - and have the software wrapper ready to hear you if you interrupt it and respond to that interruption appropriately, or to interrupt you if it has something to add. That's what would be required for it to truly be natural conversation, because that's the only way you can interrupt it or be interrupted by it in the way that humans do when talking to each other, with the sort of responsiveness that makes it fluid rather than like an intercontinental phone call. I don't think we're going to get that sort of performance improvement any time soon by just continuing to scale regular silicon or ASICs. We need new, specific hardware. And I know a lot of people might think of the conversational ease of use as not really important, not compared to the capabilities. There's an element of truth to that. But here's the thing: ChatGPT was primarily a UX/UI invention, not a technological one, and ChatGPT is what has driven this insane amount of interest, new use cases, and hype. GPT-3 was nearly as powerful, it was just much more clunky and with various other factors that meant you couldn't just go and use it. Making it easier to use was what made it so much more valuable to people that they actually wanted to use it for their problems. And it will go a long way beyond just making it conversational, too. The 2020s are going to be an absurd decade and we're not even halfway through.