| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by swyx 1023 days ago
	its not so much about benefit, as it is a design goal to want large context windows. https://twitter.com/suchenzang/status/1699926157028897078?s=... notes some issues directly comparing the 16k context number. the odd choice of tokenizer means its effectively like a 10-12k model (? ballpark, not calculated)

2 comments

euclaise 1022 days ago

That tweet had it backwards, more tokens in tokenizer means that the 16k token context window typically allows for even longer passages than if LLaMA were 16k

link

craigacp 1022 days ago

There's a correction to that tweet, larger vocab means fewer tokens for any given sequence (usually, assuming it's not to add other languages or character sets).

link