Yes, this is what we do as a RAG workflow. We created a list of all 32bit unsigned integers and whether they were even or odd, and we pass that into the context. The future is amazing!
We have an agentic system that looks up the context size, and then summarizes the even/odd table if necessary. We lose a little bit of accuracy, but now we can handle any model. Be sure to like & subscribe!
I have found that even 2 bit quantization works, but you have to make sure you only discard the LABs (that’s what we are calling the Left Aligned Bits internally). I have no idea why it works so well but it has cut our costs significantly.
Does the RAG part look up just the needed number?
I think that Gemini has a million token window (yes?) - do you have access to a model with a larger window?
Regardless, I find your ideas intriguing and wish to subscribe to your Substack.