Hacker News new | ask | show | jobs
by dheera 990 days ago
I feel like information theory prevents full information retention for unlimited context lengths and finite compute, but I don't know if we are at information theory limits to invoke this argument. Or rather, I don't know how to make a good analysis of (bits of context information) per (bits of model parameters).