| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by paulmendoza 1144 days ago
	I feel like this just killed a few small startups who were trying to offer more context. Also, I pay for ChatGPT but I have none of the new features except for GPT4. Very frustrating.

12 comments

arbuge 1144 days ago

Same here: https://twitter.com/arbuge/status/1654288169397805057

The really odd thing is that I was given GPT-4 with browsing alpha enabled - for a single session last week.

As soon as I reloaded the page, it was gone. Since then the picture has reverted back to the above.

Twitter has become a bit painful to read these days, with all the AI influencers posting about what GPT-4 and plugins, code interpreter etc. can do.

link

decompiled_dev 1144 days ago

I was waiting for a while, but then I found there was a page where if you selected "I want to build plugins" then you would have never seen the option to request them.

Once I filled that in I got access within a few days.

https://openai.com/waitlist/plugins

If you are the person to say: "I am a developer and want to build a plugin"

Then it is likely you missed the option to request which plugins you want access to.

link

tough 1144 days ago

OK thanks for the pointer I submited there for every plugin now as a not-developer to see if that helps.

I stopped paying the pro because without plugins it didnt do that much tbh

link

kinlan 1144 days ago

Yeah. It's weird, I've signed up when it came up and....nothing. :(

link

YetAnotherNick 1144 days ago

32k context is $1.92 for each request.

link

nico 1144 days ago

For reference, a dev making USD 100k/year and working about 240 days a year, 8 hours/day = total of 1920 hours, or about USD 52/hour, USD 416/day

52/1.92 = 27 416/1.92 = 217

So using GPT-4 with 32k tokens, 27 times per hour, or 217 times per day, in terms of cost, is approximately the equivalent of another dev

link

ukuina 1144 days ago

FYI, 27 times per hour is basically nothing. With GPT4 over the API, I make 2-3 completion requests a minute, for 30-60 minutes at a time, when building an LLM app. This happens for 3-4 hours per day.

At the upper bound, this would be $2 * 3 * 60 * 4 = $1440 a day.

Thankfully, I am using retriever-augmentation and context stuffing into the base 4k model, so costs are manageable.

The 32k context model cannot be deployed into a production app at this pricing as a more capable drop-in replacement for shorter-context models.

link

ZephyrBlu 1144 days ago

Depends heavily on your product. I can imagine there are quite a lot of use cases that have relatively infrequent API usage or highly cacheable responses.

link

bomewish 1143 days ago

> retriever-augmentation and context stuffing

Care you elaborate? This sounds very interesting & useful. Just anything about the setup and implementation would be super helpful.

link

ukuina 1142 days ago

This should get you started: https://haystack.deepset.ai/tutorials/22_pipeline_with_promp...

link

KeplerBoy 1144 days ago

That's a lot of requests.

Not that it matters for the calculation, but i wonder how long such a request (ingesting 32k tokens and responding with a similar amount) would take.

At the speed of regular ChatGPT take would take a good while.

link

atq2119 1144 days ago

Batch processing scales quadratically with the context size (assuming OpenAI is still using standard transformer architecture) but the batch processing of the prompt is also fast compared to generating tokens because it's batched (parallel). So I wouldn't expect effective response times to go up quadratically. At most linearly, depending on the details of how they implement inference.

link

achandlerwhite 1144 days ago

Is it prorated for the actual context used for each request?

link

ZephyrBlu 1144 days ago

Yes, it's not a fixed price: https://openai.com/pricing.

Yes

It's exceedingly expensive and must surely come down over time.

link

fbrncci 1144 days ago

Those startups will move on to open source models because OpenAI api calls with 32k token contexts are way too expensive.

link

cced 1144 days ago

What is the latest in conversational models that allow GPT3 like (or close) performance w.r.t running things locally?

link

noman-land 1144 days ago

Apparently Vicuna 13B is quite good according to Google's own leaked docs.

https://twitter.com/jelleprins/status/1654197282311491592

link

space_fountain 1144 days ago

That's according to this (https://lmsys.org/blog/2023-03-30-vicuna/) promotional blog post and just cited by the google memo right? Which isn't really even a doc, just a memo that was circulating inside google.

I also find it strange they don't contrast gpt4 and gpt3.5

link

int_19h 1144 days ago

This assessment is based largely on GPT-4 evaluation of the output. In actual use, Vicuna-13B isn't even as good as GPT-3.5, although I do have high hopes for 30B if and when they decide to make that available (or someone else trains it, since the dataset is out).

And don't forget that all the LLaMA-based models only have 2K context size. It's good enough for random chat, but you quickly bump into it for any sort of complicated task solving or writing code. Increasing this to 4K - like GPT-3.5 has - would require significantly more RAM for the same model size.

link

amelius 1144 days ago

Is there a way to always stay up to date with the latest and best performing models? Perhaps it's me but I find it difficult to navigate HuggingFace and find models sorted by benchmark.

link

noman-land 1144 days ago

Honestly, I just read hackernews :).

link

amelius 1144 days ago

HN posts are not always in chronological order.

link

nickthegreek 1144 days ago

I check r/LocalLlama

link

modernpink 1144 days ago

GPT3 is dated so many open source models are competitive with it, but Vicuna 13b is supposed to be competitive with GPT4

link

speedgoose 1144 days ago

Against GPT3.5 perhaps the gaps aren’t too big for your use cases, but I wouldn’t say it’s in the GPT4 league. It looks close in the benchmarks but the difference in quality feels (to me) huge in practice. The others models are simply a lot worse.

link

modernpink 1144 days ago

Interesting. Have you tried StableVicuna?

link

speedgoose 1144 days ago

No, is it worth a try? I didn’t see a lot of hype about it so I didn’t try it.

link

raincole 1144 days ago

I don't think it's expensive at all. For things that don't need to be so correct (like, unfortunately, marketing blog posts) it's a <$1 per post generator, which is very cheap to me.

For things where correctness matters, the majority of cost will still come from humans who are in charge of ensuring correctness.

link

fbrncci 1144 days ago

Even if it was around 0.10$. This does not scale, it would need to be less than 0.01$ per generation to keep up with open source models where the cost effectively is 0$ (leaving our hardware). These open source models are still not replacing GPT4, but they are moving into that territory.

link

raincole 1143 days ago

Oh really. Then show me your "open source model" that handles 32k tokens on a consumer-grade PC. Actually don't show me, show the internet. You will be the most famous man in tech world.

link

fbrncci 1143 days ago

Well surely I can't convince you, feel free to build the next AI startup on OpenAI then, and stop caring about any possible competition out scaling you once token limits on open source models become more in line with the walled garden of Google, MS and OpenAI's high API pricing ;)

link

raincole 1143 days ago

My bet is open source models (true open source without string attached) won't ever catch up OpenAI etc. I'll be really surprised if there is one that can match GPT-4 in the next 2~3 years. If you tried LLaMA and StableLM you would probably feel the same.

link

danjc 1144 days ago

Use cases for individual people are ok but it's far too expensive to deploy into your SaaS where a large number of users will use it.

link

MichaelZuo 1144 days ago

Considering that increasing context length is O(n^2), and that current 8k GPT-4 is already restricted to 25 prompts/3 hours, I think they will launch it at substantially higher pricing.

link

tempaccount420 1144 days ago

> current 8k GPT-4 is already restricted to 25 prompts/3 hours

I'm pretty sure they're using a 4k GPT-4 model for ChatGPT Plus, even though they only announced 8k and 32k... It can't handle more than 4k of tokens (actually a little below that, starts ignoring your last few sentences if you get close). If you check developer tools, the request to an API /models endpoint says the limit for GPT-4 is 4096. It's very unfortunate.

link

reaperman 1144 days ago

Ah this explains a lot. I couldn't understand why I couldn't get close to the ~12 pages that everyone was saying 8,000 tokens implied.

link

tempaccount420 1144 days ago

As far as I know it's not documented anywhere and there is no way to ask the team at ChatGPT questions. I sent them an email about it a few days after GPT-4 release and still haven't received a reply.

Another thing that annoys me is how most updates don't get a changelog entry. For whatever reason, they keep little secrets like that.

link

jiggawatts 1143 days ago

Their PR is terrible and I get the impression that they wish their own users would “just go away”.

Every time I see a company act like this, more responsive and truly open competition eventually eats their lunch.

link

int_19h 1144 days ago

The raw chat log has the system message on top, plus "user:" and "assistant:" for each message, and im_start/im_end tokens to separate messages, hence why the visible chat context is slightly under 4k.

link

cubefox 1144 days ago

O(n^2) seems unlikely:

https://cognitiverevolution.substack.com/p/openais-foundry-l....

https://news.ycombinator.com/item?id=34977194#:~:text=Sparse...

link

MichaelZuo 1144 days ago

Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.

link

space_fountain 1144 days ago

GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them

link

MichaelZuo 1142 days ago

It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.

link

cubefox 1142 days ago

It's not the scale itself, it's the scaling architecture.

link

choeger 1144 days ago

It will be interesting to see how far this quadratic algorithm carries in practice. Even the longest documents can only have hundreds of thousands of tokens, right?

link

sebzim4500 1144 days ago

Ideally you'd be able to put your entire codebase + documentation + jira tickets + etc. into the context. I think there is no practical limit to how many tokens would be useful for users, so the limits imposed by the model (either hard limits or just pricing) will always be a bottleneck.

link

jtbayly 1144 days ago

I'm confused by this. Would you want to just include your codebase, documentation, etc. in some last-mile training? That way you don't need the expense of including huge amounts of context in every query. It's baked in.

link

sebzim4500 1144 days ago

I haven't tried this myself, but it is my understanding that finetuning does not work well in practice as a way of acquiring new knowledge.

There may be a middle ground between these two approaches though. If every query used the same prompt prefix (because you only update the codebase + docs occasionally) then you could put it into the model once and cache the keys and values from the attention heads. I wonder if OpenAI does this with whatever prefix they use for ChatGPT?

link

sdenton4 1144 days ago

Yeah there's really three options here... Throw everything in context, fine tune, or add external search a la RETRO.

The latter is definitely the cheapest option; updates are trivial.

link

mlyle 1144 days ago

Yah... we really need some kind of architecture that juggles concept vectors around to external storage and does similarity search, etc, instead of forcing us to encode everything into giant tangles of coefficients.

GPT-4 seems to show that linear algebra definitely can do the job, but training is so expensive and the model gets so huge and inflexible.

It seems like having fixed format vectors of knowledge that the model can use-- denser and more precise than just incorporating tool results as tokens like OpenAI's plugin approach-- is a path forward towards extensibility and online learning.

link

Keyframe 1144 days ago

some of the context length will be lost to waste spent on truncated posts, or are replies not considered part of context on ChatGPT? In both cases, might be worth designing a prompt, every so often, to get a reply with which to re-establish the context, thus compressing it.

link

totoglazer 1144 days ago

It’s been available on Azure in preview. Pricing is double the 8K model.

link

HarHarVeryFunny 1144 days ago

MosaicML StoryWriter 65K model just released a day or two ago.

link

chrisMyzel 1144 days ago

https://www.mosaicml.com/blog/mpt-7b 65k+ context window, open source, open weights

link

chillfox 1144 days ago

Same. No plugins or GPT-4 API for me despite signing up for the waiting lists on the day they were announced.

link

saulpw 1144 days ago

Have you been using the API with GPT-3.5? I wonder if they're prioritizing access to 'active' users who appear to be trying to make something with it, over casual looky-loos.

link

jerrygenser 1144 days ago

Paying for chatgpt I believe is separate from API access

link

maxdaten 1144 days ago

It is. For API access you have to create an account at https://platform.openai.com. You pay per 1k token. For API access to GPT-4 put your organization (org id) on the waitlist.

link

nickthegreek 1144 days ago

Finally got gpt4 api access. Now I can cancel my ChatGPT plus sub and save a bunch of cash by just using a local client.

link

VeninVidiaVicii 1144 days ago

Again, frustrating. I’m an antibiotics researcher with oodles of data and I need ChatGPT plugins/API to make any real progress. (I’m kind of in this intellectual space on my own, so other people can’t really help that much) I’m not sure why I’ve been on the waiting list for so long now.

link

sashank_1509 1144 days ago

I got access to ChatGPT plugins and they’re really bad, completely deserving of “alpha”. I’d be pissed if I paid 25$ for this fyi.

It’s very slow, almost 10X slower than ChatGPT

It’s integration is bad. For most plugins it doesn’t do anything smart with its API call. For example if I ask “Nearest cheap International flight”, it literally goes to Kayak and searches Nearest Cheap International Flight, if Kayak can’t handle that query, GPT can’t either.

The only plug-in with good integration is Wolfram and it makes so many syntax errors calling Wolfram that it’s thrash. Often it just syntax errors out for half my queries

I wouldn’t have minded if they spent a few more months internally testing plug-ins before rolling it out to me, seeing it’s current state. The annoying thing is the chat website automatically starts at plugins mode which is borderline unusable. So every time I have to click on the drop-down and then choose ChatGPT or GPT4.

link

VeninVidiaVicii 1144 days ago

Thanks for assuaging my FOMO a bit. I think one of the most frustrating parts is that everyone in my lab looks to me when they see this stuff on Twitter and all I can really do is shrug.

link

JieJie 1144 days ago

I use the API for anything I can't do with Bing Chat, but I've found Bing Chat to be quite useful.

For code, I use phind.com.

https://www.phind.com/tutorial

link

ZephyrBlu 1144 days ago

Dude, chill. Plugins are insanely new. Barely anyone has access to them. It just seems like they are widespread because they've been going viral.

The initial blog post was only just over a month ago, and it was announcing alpha access for a few users and developers:

> Today, we will begin extending plugin alpha access to users and developers from our waitlist. While we will initially prioritize a small number of developers and ChatGPT Plus users, we plan to roll out larger-scale access over time.

https://openai.com/blog/chatgpt-plugins

We are literally 1 month into the alpha of plugins.

link

mptest 1144 days ago

I think part of the anxiety, at least for me, is how fast progress is being made too. Can begin to feel like the "LET ME IN" meme, when you're watching all day the cool things those inside the magic shop can do lol. Layman btw just looking to use it to automate some volunteer work I do. Thanks for this perspective on how new this stuff is.

link

ZephyrBlu 1144 days ago

I completely agree, I feel the same way as a dev. GPT-4 is not even 2 months old.

The developer livestream was on March 14th: https://www.youtube.com/live/outcGtbnMuQ?feature=share.

The time since GPT-4 already feels something like 6 months. So far I'm perpetually feeling behind.

link

mptest 1144 days ago

Can't imagine trying to keep up as a dev. Any of these tools useful for you in practice yet?

I struggle to keep up and all I need to do is understand developments well enough to simplify them in to palatable morsels for my tech skeptic colleagues in politics and non profits.

Challenging because they have a form of technology PTSD. when they hear "new technology" nft's of monkeys with 6 digit prices and peter thiel's yacht flash before their eyes and they see red.

And I can't really blame them, the rhetoric around crypto was enough to sour most non techies (in my little corner of lefty politics anyway) against the idea that any tech advancement is noteworthy. One of the first more serious individuals in politics to hear me out did so because "i sounded like one of the early linux proselytizers" lol.

Completely agree how time has slowed. I rotate between absolute giddy anticipation at our future thanks to the tech and nihilistic doomerism. Even as a hobbyist though I knew to take this seriously since I saw robert miles talk about gpt 2 in 2017(?) and note there's zero sign of these things plateauing in ability simply by ramping up parameter count.

I've gone on long enough but that live stream felt like the intro to a sci fi movie at points. Can't wait to have multi modal and plugins rolled out.

link

VeninVidiaVicii 1144 days ago

I can’t believe it’s only been 1 month. It feels like 3-4 somehow.

link

danjc 1144 days ago

Try OpenAI services in Azure. We were added to a waitlist but got approved a week later. Had 32k for a few weeks now but still on the waitlist for plugins.

link

ShamelessC 1144 days ago

> I feel like this just killed a few small startups who were trying to offer more context.

Those startups killed themselves. A 32K context was advertised as a feature to be rolled out the same day GPT-4 came out.

Also - what startups are getting even remotely close to 32K context at GPT-4’s parameter count? All I’ve seen is attempts to use KNN over a database to artificially improve long term recall.

link

TeMPOraL 1144 days ago

Depends on the use case. Performance quickly tanks when you get to high token count; it's a slowdown I believe the various summarizers/context extenders mostly avoid.

(Also UI probably tanks too. I dread what the OpenAI Playground will do when you start actually using 32k model for real, like throwing a 15k token long prompt at it. ChatGPT UI has no chance.)

link

toxicFork 1144 days ago

It's Hella expensive so I think they are ok for now

Until they cut down the cost then they should worry yeah

link

fakedang 1144 days ago

Honestly for the firms that would use it, for example finance or legal, it's very reasonable.

link