Hacker News new | ask | show | jobs
by michaelrbock 242 days ago
Hi, author of this paper + repo here. This dataset is particularly hard to come by, so we’re really proud to be open sourcing it.

Let me know if you have any questions, happy to discuss!

2 comments

> For example, in the prompt for this experiment, the model is bootstrapped with the correct Form 1040 lines and short instructions as part of its context.

Given that only short instructions are in context, I would not have expected even a frontier model to score well on this benchmark. For better results, I'd think that giving the model access to the entire tax code is required (which likely requires RAG due to its sheer size).

We tested models with knowledge cutoffs in 2025 so expect them to have knowledge of Tax Year 2024 forms in their weights. We also tested models with ability to do web search to get any other forms it thinks necessary: https://github.com/column-tax/tax-calc-bench

That all being said, we agree, which is what we've built with our internal tax coding agent, Iris: https://www.columntax.com/blog/introducing-iris-our-ai-tax-d... (ability to get just the right Tax form context on a per-line basis to turn the tax law into code).

This topic is so American. In any other country, you wouldn't have had to consult a tax expert to prepare personal tax statements.