pip install "headroom-ai[proxy]"
headroom proxy --port 8787
It will:
* Check all the data going into the LLM and apply intelligent compression based on the content type - different for JSONs, code etc.
* If the LLM is not getting what it is seeking, there is reversible compression - so the LLM will not lose accuracy
* When you think of MCP tools, code function calls etc. that fill up the context window and cause needle in haystack problems - they get eliminated.
There is also an SDK which works like this:
from langchain_openai import ChatOpenAI from headroom.integrations import HeadroomChatModel
# Wrap your model - that's it!
llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
# Use exactly like before response = llm.invoke("Hello!")
Ive personally used it with Claude Code and Cursor and seen the benefits.
pip install "headroom-ai[proxy]"
headroom proxy --port 8787
It will:
* Check all the data going into the LLM and apply intelligent compression based on the content type - different for JSONs, code etc.
* If the LLM is not getting what it is seeking, there is reversible compression - so the LLM will not lose accuracy
* When you think of MCP tools, code function calls etc. that fill up the context window and cause needle in haystack problems - they get eliminated.
There is also an SDK which works like this:
from langchain_openai import ChatOpenAI from headroom.integrations import HeadroomChatModel
# Wrap your model - that's it!
llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
# Use exactly like before response = llm.invoke("Hello!")
Ive personally used it with Claude Code and Cursor and seen the benefits.