| @vishnumenon, thanks for writing this up. I might do a followup blog on this, for now here is a comment :) This article is close to my heart as I've been working on craftcraft.org with similar perspective. 1. Chat UX Consolidation: I agree that having crappy chat UIs everywhere is very suboptimal. Perhaps having a complete UX as a component is another solution here.
We took many months to get http://chatcraft.org from prototype to an ergonomic productivity tool. Highly unlikely such attention will be paid to every chat UI integration. 2. Persistence Across Uses. This one is tricky. We keep all of our history client-side...but after using it this way, having a shared history server-side and having it pulled in as relevant context would be a nice improvement. 3. Universal Access: It's super weird to have LLMs restricted to providing output that you cut/paste. We have integrated pretty slick openai function interface to allow calling out to custom modules. So far we integrated: pdf conversion/ingestion, clickhouse analytics and system administration using a webrtc<->shell connector. Demo here: https://www.youtube.com/watch?v=UNsxDMMbm64 I've also investigated teaching LLMs consume UIs via accessibility UIs. I think this is underexplored. Blog post on that here: https://taras.glek.net/post/gpt-aria-experiment/ 3b. LocalLLMs. These have been underwhelming so far vs openai ones(except maybe WizardCoder). Industry seems to be standardizing around openai-compatible REST interface(ala S3 clones). We have some support for this in a wip pull req, but not much reason to do that yet as the local models are relatively weak for interactive use. 4. Dynamically Generated UI & Higher Level Prompting: I do a lot of exploration by asking http://chatcraft.org to generate some code and run it to validate some idea. Friend of mine built basic UX for recruiting pipelines, where one can ingest resume pdfs into chatcraft and via custom system prompt have chatcraft become a supervised recruiting automation. We also do a lot generation of mermaid architecture diagrams when communicating about code. I think there a lot of room for UX exploration here. Now a few categories that weren't covered: 1. Multi-modal interaction: It's so nice to be able to have chat with the assistant and then switch to voice while driving or to input some foreign language. I think extending UX from chat to voice and even video-based gestures will make for an even cooler AI assistant experience. 2. Non-linearity in conversations: Bots are not human, so it makes sense to undo steps in conversation, fork them, re-run them with different input params and different model params. Most of my conversations in chatcraft are me trying to beat llm into submission. Example: tuning chan-of-density prompt https://www.youtube.com/watch?v=6Vj0zwP3uBs&feature=youtu.be Overall, really appreciate your blog post. Interesting to see how our intuition overlaps. |