Hacker News new | ask | show | jobs
Show HN: Headroom (OSS): Cuts LLM costs by 85% (github.com)
3 points by chopratejas 160 days ago
4 comments

Some results from real world data so far:

  ┌─────────────────┬─────────────┬──────────────────────────────┐
  │    Data Type    │ Compression │             Why              │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Server logs     │ 90%+        │ Highly repetitive patterns   │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ MCP tool output │ 70%+        │ JSON structure overhead      │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Database rows   │ 50-70%      │ Same schema, many records    │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ File trees      │ 40-50%      │ Repeated metadata            │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Code diffs      │ 0%          │ Every line unique            │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Dense prose     │ -0.3%       │ No patterns, slight overhead │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Encrypted       │ 0%          │ Incompressible               │
  └─────────────────┴─────────────┴──────────────────────────────┘
What is it?

- Context Compression (with Reversibility - this part is the difference) for LLMs

- very different than any compression or summarization tools that promise cost savings and speed!

- claude code costs / cursor costs - reduced by 50-60%

- ideal for startups and Enterprises!!

- integration with LangChain

- Memory as a first class citizen

- its OSS! So Free!

Give it a try, Its OSS - if you love it, star it. If you don't, lets make it better, together!

Seems very useful. I tried it on my Claude code and it was saving approximately 50% Do you know how I can push it to save more? Do you also have plans to make it Enterprise ready?
Not a single example of what it does or how it works
Fair enough. Trying to keep it concise here - This is how you install it:

pip install "headroom-ai[proxy]"

headroom proxy --port 8787

It will:

* Check all the data going into the LLM and apply intelligent compression based on the content type - different for JSONs, code etc.

* If the LLM is not getting what it is seeking, there is reversible compression - so the LLM will not lose accuracy

* When you think of MCP tools, code function calls etc. that fill up the context window and cause needle in haystack problems - they get eliminated.

There is also an SDK which works like this:

from langchain_openai import ChatOpenAI from headroom.integrations import HeadroomChatModel

# Wrap your model - that's it!

llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))

# Use exactly like before response = llm.invoke("Hello!")

Ive personally used it with Claude Code and Cursor and seen the benefits.