Hacker News new | ask | show | jobs
by csomar 340 days ago
My understanding is that caching reduce computation but the whole input is still processed. I don’t think is fully disclosing how their cache works.

LLMs degrade with long input regardless of caching.