| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by biddit 817 days ago

> The larger context window (200k tokens vs ~16k)

Just to add some clarification - the newer GPT4 models from OpenAI have 128k context windows[1]. I regularly load in the entirety of my React/Django project, via Aider.

1. https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...

1 comments

danielbln 817 days ago

GPT-4 has much worse recall compared to Claude-3 though, compare these two haystack tests:

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/ma...

https://cdn.arstechnica.net/wp-content/uploads/2024/03/claud...

link

aubanel 816 days ago

Be aware the haystack test is not good at all (in its current form). It's a single piece of information inserted in the same text each time, a very poor measurement of how well the LLM can retrieve info.

link

bufferoverflow 816 days ago

Seems like a very good test for recall.

link

aubanel 815 days ago

Even in the most restrictive définition of recall as in "retrieve a short contiguous piece of information inside an unrelated context", it's not that good. It's always the exact same needle inserted in the exact same context. Not the slightest variation apart from the location of the needle.

Then if you want to test for recall of sparse information or multi-hop information, it's useless.

link

BA2255 802 days ago

For my education, how do you use the 200k contenxt the normal chats like Poe, or chatgpt don't accept longer than 4k maximum. Do you use them in specific Playgrounds or other places?

link

aubanel 798 days ago

The calls with long context are done through specific APIs, that you can call for instance in Python or Javascript.

Here's a quick start guide with OpenAI: https://platform.openai.com/docs/quickstart?context=python

link