Hacker News new | ask | show | jobs
by pants2 699 days ago
I don't expect organizations to need to generate 1T output tokens, but 1T input tokens is common. Consider developers at a large company running queries with their entire codebase as context. Or lawyers plugging in the entire tax code to ask questions about. Each of them running dozens of queries per day on multi-millions of context input, it's going to add up quick.
1 comments

Wouldn't a lawyer wanting to run queries against the entire tax code have a model that was fine-tuned on all of that data though? I mean, vs. doing RAG by sending the entire tax code on each request.
Unclear, but fine-tuning has many problems not faced by RAG:

- More prone to hallucinations

- Worse at citing sources for people to double check outputs

- Can't be updated without retraining

- Can't impose knowledge access controls for different users