Y
Hacker News
new
|
ask
|
show
|
jobs
by
throwaway89201
295 days ago
The training sets of most LLMs contain a copious amount of content from Libgen (or now: Anna's Archive), where em dashes are frequently used in literary writing.
1 comments
nullc
294 days ago
Who the hell knows how the initial biases of LLM's broke.
My IRC name (gmaxwell) is a token in the GPT3 tokenizer.
link
My IRC name (gmaxwell) is a token in the GPT3 tokenizer.