Hacker News new | ask | show | jobs
by paulmendoza 1144 days ago
I feel like this just killed a few small startups who were trying to offer more context.

Also, I pay for ChatGPT but I have none of the new features except for GPT4. Very frustrating.

12 comments

Same here: https://twitter.com/arbuge/status/1654288169397805057

The really odd thing is that I was given GPT-4 with browsing alpha enabled - for a single session last week.

As soon as I reloaded the page, it was gone. Since then the picture has reverted back to the above.

Twitter has become a bit painful to read these days, with all the AI influencers posting about what GPT-4 and plugins, code interpreter etc. can do.

I was waiting for a while, but then I found there was a page where if you selected "I want to build plugins" then you would have never seen the option to request them.

Once I filled that in I got access within a few days.

https://openai.com/waitlist/plugins

If you are the person to say: "I am a developer and want to build a plugin"

Then it is likely you missed the option to request which plugins you want access to.

OK thanks for the pointer I submited there for every plugin now as a not-developer to see if that helps.

I stopped paying the pro because without plugins it didnt do that much tbh

Yeah. It's weird, I've signed up when it came up and....nothing. :(
32k context is $1.92 for each request.
For reference, a dev making USD 100k/year and working about 240 days a year, 8 hours/day = total of 1920 hours, or about USD 52/hour, USD 416/day

52/1.92 = 27 416/1.92 = 217

So using GPT-4 with 32k tokens, 27 times per hour, or 217 times per day, in terms of cost, is approximately the equivalent of another dev

FYI, 27 times per hour is basically nothing. With GPT4 over the API, I make 2-3 completion requests a minute, for 30-60 minutes at a time, when building an LLM app. This happens for 3-4 hours per day.

At the upper bound, this would be $2 * 3 * 60 * 4 = $1440 a day.

Thankfully, I am using retriever-augmentation and context stuffing into the base 4k model, so costs are manageable.

The 32k context model cannot be deployed into a production app at this pricing as a more capable drop-in replacement for shorter-context models.

Depends heavily on your product. I can imagine there are quite a lot of use cases that have relatively infrequent API usage or highly cacheable responses.
> retriever-augmentation and context stuffing

Care you elaborate? This sounds very interesting & useful. Just anything about the setup and implementation would be super helpful.

That's a lot of requests.

Not that it matters for the calculation, but i wonder how long such a request (ingesting 32k tokens and responding with a similar amount) would take.

At the speed of regular ChatGPT take would take a good while.

Batch processing scales quadratically with the context size (assuming OpenAI is still using standard transformer architecture) but the batch processing of the prompt is also fast compared to generating tokens because it's batched (parallel). So I wouldn't expect effective response times to go up quadratically. At most linearly, depending on the details of how they implement inference.
Is it prorated for the actual context used for each request?
Yes, it's not a fixed price: https://openai.com/pricing.
Yes
It's exceedingly expensive and must surely come down over time.
Those startups will move on to open source models because OpenAI api calls with 32k token contexts are way too expensive.
What is the latest in conversational models that allow GPT3 like (or close) performance w.r.t running things locally?
Apparently Vicuna 13B is quite good according to Google's own leaked docs.

https://twitter.com/jelleprins/status/1654197282311491592

That's according to this (https://lmsys.org/blog/2023-03-30-vicuna/) promotional blog post and just cited by the google memo right? Which isn't really even a doc, just a memo that was circulating inside google.

I also find it strange they don't contrast gpt4 and gpt3.5

This assessment is based largely on GPT-4 evaluation of the output. In actual use, Vicuna-13B isn't even as good as GPT-3.5, although I do have high hopes for 30B if and when they decide to make that available (or someone else trains it, since the dataset is out).

And don't forget that all the LLaMA-based models only have 2K context size. It's good enough for random chat, but you quickly bump into it for any sort of complicated task solving or writing code. Increasing this to 4K - like GPT-3.5 has - would require significantly more RAM for the same model size.

Is there a way to always stay up to date with the latest and best performing models? Perhaps it's me but I find it difficult to navigate HuggingFace and find models sorted by benchmark.
Honestly, I just read hackernews :).
HN posts are not always in chronological order.
I check r/LocalLlama
GPT3 is dated so many open source models are competitive with it, but Vicuna 13b is supposed to be competitive with GPT4
Against GPT3.5 perhaps the gaps aren’t too big for your use cases, but I wouldn’t say it’s in the GPT4 league. It looks close in the benchmarks but the difference in quality feels (to me) huge in practice. The others models are simply a lot worse.
Interesting. Have you tried StableVicuna?
No, is it worth a try? I didn’t see a lot of hype about it so I didn’t try it.
I don't think it's expensive at all. For things that don't need to be so correct (like, unfortunately, marketing blog posts) it's a <$1 per post generator, which is very cheap to me.

For things where correctness matters, the majority of cost will still come from humans who are in charge of ensuring correctness.

Even if it was around 0.10$. This does not scale, it would need to be less than 0.01$ per generation to keep up with open source models where the cost effectively is 0$ (leaving our hardware). These open source models are still not replacing GPT4, but they are moving into that territory.
Oh really. Then show me your "open source model" that handles 32k tokens on a consumer-grade PC. Actually don't show me, show the internet. You will be the most famous man in tech world.
Well surely I can't convince you, feel free to build the next AI startup on OpenAI then, and stop caring about any possible competition out scaling you once token limits on open source models become more in line with the walled garden of Google, MS and OpenAI's high API pricing ;)
My bet is open source models (true open source without string attached) won't ever catch up OpenAI etc. I'll be really surprised if there is one that can match GPT-4 in the next 2~3 years. If you tried LLaMA and StableLM you would probably feel the same.
Use cases for individual people are ok but it's far too expensive to deploy into your SaaS where a large number of users will use it.
Considering that increasing context length is O(n^2), and that current 8k GPT-4 is already restricted to 25 prompts/3 hours, I think they will launch it at substantially higher pricing.
> current 8k GPT-4 is already restricted to 25 prompts/3 hours

I'm pretty sure they're using a 4k GPT-4 model for ChatGPT Plus, even though they only announced 8k and 32k... It can't handle more than 4k of tokens (actually a little below that, starts ignoring your last few sentences if you get close). If you check developer tools, the request to an API /models endpoint says the limit for GPT-4 is 4096. It's very unfortunate.

Ah this explains a lot. I couldn't understand why I couldn't get close to the ~12 pages that everyone was saying 8,000 tokens implied.
As far as I know it's not documented anywhere and there is no way to ask the team at ChatGPT questions. I sent them an email about it a few days after GPT-4 release and still haven't received a reply.

Another thing that annoys me is how most updates don't get a changelog entry. For whatever reason, they keep little secrets like that.

Their PR is terrible and I get the impression that they wish their own users would “just go away”.

Every time I see a company act like this, more responsive and truly open competition eventually eats their lunch.

The raw chat log has the system message on top, plus "user:" and "assistant:" for each message, and im_start/im_end tokens to separate messages, hence why the visible chat context is slightly under 4k.
Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.
GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
It's not the scale itself, it's the scaling architecture.
It will be interesting to see how far this quadratic algorithm carries in practice. Even the longest documents can only have hundreds of thousands of tokens, right?
Ideally you'd be able to put your entire codebase + documentation + jira tickets + etc. into the context. I think there is no practical limit to how many tokens would be useful for users, so the limits imposed by the model (either hard limits or just pricing) will always be a bottleneck.
I'm confused by this. Would you want to just include your codebase, documentation, etc. in some last-mile training? That way you don't need the expense of including huge amounts of context in every query. It's baked in.
I haven't tried this myself, but it is my understanding that finetuning does not work well in practice as a way of acquiring new knowledge.

There may be a middle ground between these two approaches though. If every query used the same prompt prefix (because you only update the codebase + docs occasionally) then you could put it into the model once and cache the keys and values from the attention heads. I wonder if OpenAI does this with whatever prefix they use for ChatGPT?

Yeah there's really three options here... Throw everything in context, fine tune, or add external search a la RETRO.

The latter is definitely the cheapest option; updates are trivial.

Yah... we really need some kind of architecture that juggles concept vectors around to external storage and does similarity search, etc, instead of forcing us to encode everything into giant tangles of coefficients.

GPT-4 seems to show that linear algebra definitely can do the job, but training is so expensive and the model gets so huge and inflexible.

It seems like having fixed format vectors of knowledge that the model can use-- denser and more precise than just incorporating tool results as tokens like OpenAI's plugin approach-- is a path forward towards extensibility and online learning.

some of the context length will be lost to waste spent on truncated posts, or are replies not considered part of context on ChatGPT? In both cases, might be worth designing a prompt, every so often, to get a reply with which to re-establish the context, thus compressing it.
It’s been available on Azure in preview. Pricing is double the 8K model.
MosaicML StoryWriter 65K model just released a day or two ago.
https://www.mosaicml.com/blog/mpt-7b 65k+ context window, open source, open weights
Same. No plugins or GPT-4 API for me despite signing up for the waiting lists on the day they were announced.
Have you been using the API with GPT-3.5? I wonder if they're prioritizing access to 'active' users who appear to be trying to make something with it, over casual looky-loos.
Paying for chatgpt I believe is separate from API access
It is. For API access you have to create an account at https://platform.openai.com. You pay per 1k token. For API access to GPT-4 put your organization (org id) on the waitlist.
Finally got gpt4 api access. Now I can cancel my ChatGPT plus sub and save a bunch of cash by just using a local client.
Again, frustrating. I’m an antibiotics researcher with oodles of data and I need ChatGPT plugins/API to make any real progress. (I’m kind of in this intellectual space on my own, so other people can’t really help that much) I’m not sure why I’ve been on the waiting list for so long now.
I got access to ChatGPT plugins and they’re really bad, completely deserving of “alpha”. I’d be pissed if I paid 25$ for this fyi.

It’s very slow, almost 10X slower than ChatGPT

It’s integration is bad. For most plugins it doesn’t do anything smart with its API call. For example if I ask “Nearest cheap International flight”, it literally goes to Kayak and searches Nearest Cheap International Flight, if Kayak can’t handle that query, GPT can’t either.

The only plug-in with good integration is Wolfram and it makes so many syntax errors calling Wolfram that it’s thrash. Often it just syntax errors out for half my queries

I wouldn’t have minded if they spent a few more months internally testing plug-ins before rolling it out to me, seeing it’s current state. The annoying thing is the chat website automatically starts at plugins mode which is borderline unusable. So every time I have to click on the drop-down and then choose ChatGPT or GPT4.

Thanks for assuaging my FOMO a bit. I think one of the most frustrating parts is that everyone in my lab looks to me when they see this stuff on Twitter and all I can really do is shrug.
I use the API for anything I can't do with Bing Chat, but I've found Bing Chat to be quite useful.

For code, I use phind.com.

https://www.phind.com/tutorial

Dude, chill. Plugins are insanely new. Barely anyone has access to them. It just seems like they are widespread because they've been going viral.

The initial blog post was only just over a month ago, and it was announcing alpha access for a few users and developers:

> Today, we will begin extending plugin alpha access to users and developers from our waitlist. While we will initially prioritize a small number of developers and ChatGPT Plus users, we plan to roll out larger-scale access over time.

https://openai.com/blog/chatgpt-plugins

We are literally 1 month into the alpha of plugins.

I think part of the anxiety, at least for me, is how fast progress is being made too. Can begin to feel like the "LET ME IN" meme, when you're watching all day the cool things those inside the magic shop can do lol. Layman btw just looking to use it to automate some volunteer work I do. Thanks for this perspective on how new this stuff is.
I completely agree, I feel the same way as a dev. GPT-4 is not even 2 months old.

The developer livestream was on March 14th: https://www.youtube.com/live/outcGtbnMuQ?feature=share.

The time since GPT-4 already feels something like 6 months. So far I'm perpetually feeling behind.

Can't imagine trying to keep up as a dev. Any of these tools useful for you in practice yet?

I struggle to keep up and all I need to do is understand developments well enough to simplify them in to palatable morsels for my tech skeptic colleagues in politics and non profits.

Challenging because they have a form of technology PTSD. when they hear "new technology" nft's of monkeys with 6 digit prices and peter thiel's yacht flash before their eyes and they see red.

And I can't really blame them, the rhetoric around crypto was enough to sour most non techies (in my little corner of lefty politics anyway) against the idea that any tech advancement is noteworthy. One of the first more serious individuals in politics to hear me out did so because "i sounded like one of the early linux proselytizers" lol.

Completely agree how time has slowed. I rotate between absolute giddy anticipation at our future thanks to the tech and nihilistic doomerism. Even as a hobbyist though I knew to take this seriously since I saw robert miles talk about gpt 2 in 2017(?) and note there's zero sign of these things plateauing in ability simply by ramping up parameter count.

I've gone on long enough but that live stream felt like the intro to a sci fi movie at points. Can't wait to have multi modal and plugins rolled out.

I can’t believe it’s only been 1 month. It feels like 3-4 somehow.
Try OpenAI services in Azure. We were added to a waitlist but got approved a week later. Had 32k for a few weeks now but still on the waitlist for plugins.
> I feel like this just killed a few small startups who were trying to offer more context.

Those startups killed themselves. A 32K context was advertised as a feature to be rolled out the same day GPT-4 came out.

Also - what startups are getting even remotely close to 32K context at GPT-4’s parameter count? All I’ve seen is attempts to use KNN over a database to artificially improve long term recall.

Depends on the use case. Performance quickly tanks when you get to high token count; it's a slowdown I believe the various summarizers/context extenders mostly avoid.

(Also UI probably tanks too. I dread what the OpenAI Playground will do when you start actually using 32k model for real, like throwing a 15k token long prompt at it. ChatGPT UI has no chance.)

It's Hella expensive so I think they are ok for now

Until they cut down the cost then they should worry yeah

Honestly for the firms that would use it, for example finance or legal, it's very reasonable.