Hacker News new | ask | show | jobs
by Rudybega 247 days ago
I wonder if you could dramatically improve these results with some relatively simple scaffolding and tool access.

If a ton of these mistakes are genuinely simple calculation errors, it seems like giving the models access to a calculator tool would help a fair bit.

3 comments

We agree, that's the thesis behind our tax development coding agent: https://www.columntax.com/blog/introducing-iris-our-ai-tax-d...
The problem is they do not understand what/how to calculate not the actual act of adding or multiplying. I tried asking ChatGPT to calculate some taxes for three countries, two of which I have been filing taxes already. For the two I know ChatGPT gave wildly wrong numbers (not even right ballpark), so I know I could not trust numbers for the third which was what I was mostly interested in.
I feel like we are already there. I would imagine if you set Claude Code or Codex this task, running in the CLI, you would see a huge improvement, and that is before you start creating task specific guardrails.

I’m surprised they haven’t tried this, I’m running my own in parallel against my accountant in this way.