Hacker News new | ask | show | jobs
by tarasglek 998 days ago
@vishnumenon, thanks for writing this up. I might do a followup blog on this, for now here is a comment :)

This article is close to my heart as I've been working on craftcraft.org with similar perspective.

1. Chat UX Consolidation: I agree that having crappy chat UIs everywhere is very suboptimal. Perhaps having a complete UX as a component is another solution here. We took many months to get http://chatcraft.org from prototype to an ergonomic productivity tool. Highly unlikely such attention will be paid to every chat UI integration.

2. Persistence Across Uses. This one is tricky. We keep all of our history client-side...but after using it this way, having a shared history server-side and having it pulled in as relevant context would be a nice improvement.

3. Universal Access: It's super weird to have LLMs restricted to providing output that you cut/paste. We have integrated pretty slick openai function interface to allow calling out to custom modules. So far we integrated: pdf conversion/ingestion, clickhouse analytics and system administration using a webrtc<->shell connector. Demo here: https://www.youtube.com/watch?v=UNsxDMMbm64

I've also investigated teaching LLMs consume UIs via accessibility UIs. I think this is underexplored. Blog post on that here: https://taras.glek.net/post/gpt-aria-experiment/

3b. LocalLLMs. These have been underwhelming so far vs openai ones(except maybe WizardCoder). Industry seems to be standardizing around openai-compatible REST interface(ala S3 clones). We have some support for this in a wip pull req, but not much reason to do that yet as the local models are relatively weak for interactive use.

4. Dynamically Generated UI & Higher Level Prompting: I do a lot of exploration by asking http://chatcraft.org to generate some code and run it to validate some idea. Friend of mine built basic UX for recruiting pipelines, where one can ingest resume pdfs into chatcraft and via custom system prompt have chatcraft become a supervised recruiting automation. We also do a lot generation of mermaid architecture diagrams when communicating about code. I think there a lot of room for UX exploration here.

Now a few categories that weren't covered:

1. Multi-modal interaction: It's so nice to be able to have chat with the assistant and then switch to voice while driving or to input some foreign language. I think extending UX from chat to voice and even video-based gestures will make for an even cooler AI assistant experience.

2. Non-linearity in conversations: Bots are not human, so it makes sense to undo steps in conversation, fork them, re-run them with different input params and different model params. Most of my conversations in chatcraft are me trying to beat llm into submission. Example: tuning chan-of-density prompt https://www.youtube.com/watch?v=6Vj0zwP3uBs&feature=youtu.be

Overall, really appreciate your blog post. Interesting to see how our intuition overlaps.

2 comments

Really good stuff, but some minor things: your url doesn't work; went to your twitter profile, and it seems you meant https://chatcraft.org? Also, you are un-dm-able on twitter. (I am @eating_entropy if you want to talk more)
that's the right link :)

would be great to chat on discord https://discord.gg/JsVe9ZuZCn

(updated discord link)

The discord link seems to be not working. Just a heads up.

The YOLO example on your Github page is super interesting. We are finding it easier to get LLMs to write functions with a more constrained function interface in EvaDB. Here is an example of an YOLO function in EvaDB: https://github.com/georgia-tech-db/evadb/blob/staging/evadb/....

Once the function is loaded, it can be used in queries in this way:

  SELECT id, Yolo(data)
     FROM ObjectDetectionVideos
     WHERE id < 20
     LIMIT 5;

  SELECT id
      FROM ObjectDetectionVideos
      WHERE ['pedestrian', 'car'] <@ Yolo(data).label;
Would love to hear your thoughts on ChatCraft and a more constrained function interface.
I'm actually doing a lot of work with databases and LLMs.

I enjoyed postgresml and evadb has been on my radar to try next. Would love to connect.

(updated discord link)

Your back button does not work!