Hacker News new | ask | show | jobs
by simojo 83 days ago
Today I scheduled a dentist appointment over the phone with an LLM. At the end of the call, I prompted it with various math problems, all of which it answered before politely reminding me that it would prefer to help me with "all things dental."

It did get me thinking the extent to which I could bypass the original prompt and use someone else's tokens for free.

5 comments

https://bsky.app/profile/theophite.bsky.social/post/3mhjxtxr...

>> "claude costs $20/mo but attaching an agent harness to the chipotle customer service endpoint is free"

>> "BurritoBypass: An agentic coding harness for extracting Python from customer-service LLMs that would really rather talk about guacamole."

https://bsky.app/profile/weiyen.net/post/3m7kenmok4c2n

I did something similar. Try framing your maths question in terms of teeth

And this is another easily solved problem by someone who knows what they are doing…

Voice -> speech to text engine -> LLM creates JSON that the orchestrator understands -> JSON -> regular code as the orchestration -> text based response -> text to speech

Notice that I am not using the LLM to produce output to the user and if the orchestrator (again regular old code) doesn’t get valid input, its going to error. Sure you can jailbreak my LLM interpretation. But my orchestrator is going to have the same role based permission as if I were using the same API as a backend for a website. Because I probably am

Source: creating call centers with Amazon Connect is one of my specialties

> Notice that I am not using the LLM to produce output to the user

So what output does the user get?

The programmatically generated response from the orchestrator which could be either a confirmation or request for more information.
Sure - but does this have the context of the original question that the user asked? If not it seems that it isn’t really conversational and more of a “compiler”.

How would something like “I want an appointment either on Monday afternoon after 4pm or one on Tuesday before 11am” work?

Unless all the parameters given by the user fit within the constraints of the json format then the LLM would need the context of the request and the results to answer properly, would it not?

For reference, my last discussion about this

https://news.ycombinator.com/item?id=47241412

This is a constrained space. I would do the naive implementation at first and then talk to the humans (like you) and then my JSON definition would include a timespan type field.

My orchestrator would then say “I have these times available [list of times]. What time would you like?” and then return a specific LLM prompt to parse the information I need once the user responds. But I would send that exact text to the user. Yes I’m purposefully constraining the implementation where the LLM is never used for output and never directly controls the backend

There is also the concept of “semantic alignment” where you ask the LLM to generically answer the question - “does the users answer make sense with regard to the question” as a first level filter that only returns true or false. This is again a constrained function that you pass in the question and answer to the LLM and if you get something besides true or false your code errors.

The purpose of an LLM or even before that an old school intent based system (see my link) isn’t perfection it’s “deflection”. The more that you can handle through automation the less you have to bring a human in. An American based call center when a person is an agent costs from $3–$7 a call fully allocated. An automated call can costs tenths of a penny.

Of course that doesn’t include the cost of the accepting a call in the first place over a 1-800 number and in my case the price that AWS charges per minute for Amazon Connect

> This is again a constrained function that you pass in the question and answer to the LLM and if you get something besides true or false your code errors.

Code erroring is fine for code, but what is the user experience here? Some sort of “computer says no” generic response, or something more contextual?

I’m trying to picture what the user says and hears as a response to an off-the-beaten-path question. Is it just “I don’t understand, here’s how to phrase it?”.

Could just have used NLP
NLP doesn’t have world knowledge and with one prompt, I can support almost any language. Of course the speech to text engine is specific to the language
> politely reminding me that it would prefer to help me with "all things dental."

I'm amused to imagine it actually wasn't an LLM at all, just a good-natured Jeeves-like receptionist.

(AskJeeves came too early, much better suited as a name for Kagi or something like it!)

haha for sure some one has made a little aggregator for this and saving tokens. I bet you gotta dig for a while though before you find a company exposing Opust 4.6 to customers and not flash 2.5 lite