| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hanoz 1135 days ago
	The more token capacity that's added the more wasteful it seems to have to use this statelessly. Is there any avoiding this? Wonderous as this new tech is, it seems a bit much to be paying $2 a question in a conversation about a 32k token text.

6 comments

ta988 1135 days ago

As a human if you have to present me 32k tokens and I have to give you an answer, you would probably have to pay me more than $2

link

hanoz 1135 days ago

If I wanted to have a conversation about it, and you wanted to charge me a flat fee per utterance on the basis that you had to reread the text anew every time, I wouldn't be paying you at all.

link

TeMPOraL 1135 days ago

If we were having such conversation via e-mail/IM and I learned that you're just asking me questions one by one in your replies, questions which you could've easily included in your first e-mail - then believe me when I say it, I would charge you the same way OpenAI does, and I'd throw in an extra 50% fee for being inconsiderate and not knowing how to communicate effectively.

link

hanoz 1134 days ago

> questions which you could've easily included in your first e-mail

That's not really how conversation/chat works is it?

link

jiggawatts 1135 days ago

Have you seen how lawyers bill for their time?

link

jtbayly 1135 days ago

Yeah, I can see this being useful for one-off queries, but don't they want to offer some sort of final training ("last-mile" I called it in another comment. I can't remember what the proper term is.) to companies to customize the model so it already has all the context they need baked in to every query?

link

sashank_1509 1135 days ago

They used to offer exactly this for fine tuning models. Never offered it after ChatGPT, I think the difficulty comes with fine tuning RLHF models, not obvious how to correctly do this.

link

notpachet 1135 days ago

This is available through Azure: https://azure.microsoft.com/en-us/products/cognitive-service...

link

BoorishBears 1135 days ago

As far as I know it's not.

link

heliophobicdude 1135 days ago

It's unfortunate. There are some online tutorials that instruct you to embed all your code and perform top-k cosine similarity searches, populating the responses accordingly.

It's quite interesting if you can tweak your search just right. You can even use less tokens than 8K even!

link

toxicFork 1135 days ago

The usage needs to be for high value queries.

Using it on a simple conversation is not its intended purpose, that's like using a supercomputer to play pong.

link

weird-eye-issue 1135 days ago

Handle the state on the application side...

It is like complaining that HTTP is limiting because it is stateless. Build state on top of it.

link

delusional 1135 days ago

I think he's talking about computational efficiency. If you're loading in 29k tokens and you're expecting to use those again, you wouldn't need to do the whole matrix multiplication song and dance again if you just kept the old buffers around for the next prompt.

link

weird-eye-issue 1135 days ago

I don't think this can necessarily be optimized at least with how the models work right now

link

mlyle 1135 days ago

You can ask multiple/multipart questions.

link