| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tuckerconnelly 812 days ago

I built an AI-agents tech demo[1], and am now pivoting. A few thoughts:

* I was able to make a simple AI agent that could control my Spotify account, and make playlists based on its world knowledge (rather than Spotify recommendation algos), which was really cool. I used it pretty frequently to guide Spotify into my music tastes, and would say I got value out of it.

* GPT-4 worked quite well actually, GPT-3.5 worked maybe 80% of the time. Mixtral did not work at all, aside from needing hacks/workarounds to get function-calling working in the first place.

* It was very slow and VERY expensive. Needing CoT was a limitation. Could easily rack up $30/day just testing it.

My overall takeaway: it's too early: too expensive, too slow, too unreliable. Unless you somehow have a breakthrough with a custom model.

From the marketing side, people just don't "get it." I've since niched down, and it's very, very promising from a business perspective.

[1] https://konos.ai

3 comments

andy99 812 days ago

I think it's destined to fail because it basically moved AI back into the "rules based" realm. Deep learning is a decent cognitive interface - like making a guess at some structure out of non-structure. That's where the magic happens. But when you take that and start using rules to chain it together, you're basically back to the same idea as parsing semi-structured data with regex and/or if statements. You can get it to work a bit but edge cases keep coming along that kill you, and your rules will never keep up. For simple cognitive tasks, deep learning figures out enough of the edge cases to work pretty well, but that's gone once you start making rules for how to combine predictions.

link

thekumar 812 days ago

I totally agree with this. I have been arguing with folks that current Reactflow based agent workflow tools are destined to fail, and more importantly, missing the point. Stop forcing AI into structured work.

I do think AI "agents" (or blocks as I like to think of them) unlock the potential for solving unstructured but well-scoped tasks. But it is a block of unstructured work that is very unique to a problem, and you are very likely to not find another problem where that block fits. So, trying to productize these AI blocks as re-usable agents is not that great of a value prop. And building a node based workflow tool is even less of a value prop.

However, if you can flip it inside out and build an AI agent that takes a question and outputs a node based workflow. But the blocks in the workflow are structured pre-defined blocks with deterministic inputs and outputs, or a custom AI block that you yourself built, then that is something I can find value in. This is almost like the function calling capabilities of GPT.

Building these block reminds me of the early days of cloud computing. Back then the patterns for high availability were not well-established and people that were sold on the scalability aspects of cloud computing and got onboard without accounting for service failure/availability scenarios and the ephemeral nature of EC2 instances were left burned, complaining about the unfeasibility of cloud computing.

link

noway421 811 days ago

> AI agent that takes a question and outputs a node based workflow

That rings useful to me. I find it hard to trust an AI black box to output a good result, especially chained in a sequence of blocks. They may accumulate error.

But AIs are great recommender systems. If it can output a sequence of blocks that are fully deterministic, I can run the sequence once, see it outputs a good result and trust it to output a good result in the future given I more or less understand what each individual box does. There may still be edge cases, and maybe the AI can also suggest when the workflow breaks, but at least I know it outputs the same result given the same input.

link

spxneo 812 days ago

what makes it slow? is it because they throttle your api key?

link

tuckerconnelly 812 days ago

Chain of thought takes time to generate all the characters. If you do a chain-of-thought for every action and every misstep (and you need to for quality + reliability), it adds up.

link

spxneo 812 days ago

Is there no way to share that "memory" across chats?

or are we at the mercy of hosted models?

link

ShamelessC 812 days ago

There’s caching but only so much can be cached when small changes in the input can lead to an entirely different space of outputs. Furthermore, even with caching LLM inference can take anywhere from 1-15s using GPT4-Turbo via the API. As was mentioned, the more characters you prefix in the context - the longer this takes. Similarly you have a variable length output from model (up to a fixed context length) and so the time it takes to calculate the “answer” can also take awhile. In particular with CoT you are basically forcing the model to use more characters than it otherwise would (in its answer) by asking it to explain itself in a verbose step by step manner.

link

itake 812 days ago

Our p99 for gpt4 is 3s. Images take up to 50s.

link

spxneo 812 days ago

so how would you go about improving that?

link

freediver 812 days ago

Not using an LLM for it.

link

itake 812 days ago

we only send 0.5-5% of traffic to gpt4, thanks to smaller faster cheaper models. So not all of our traffic is hit with 50s latencies :-/