| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Fiveplus 174 days ago
	This feels like the peak of resume driven development. The maker of this has taken a deterministic problem (substring matching transaction descriptions) that could be solved with a 50-line Python script or a standard .rules file and injected a non-deterministic, token-burning probability engine into the middle of it. I'll stick to hledger and a regex file. At least I know my grocery budget won't hallucinate into "Consulting Expenses" because the temperature was set too high.

15 comments

senko 174 days ago

> a deterministic problem (substring matching transaction descriptions) that could be solved with a 50-line Python script

I don't know about your bank transactions, but at least in my case the descriptions are highly irregular and basically need a hardcode for each and every damn pos location across each of the store location across each vendor.

I attempted that (with a Python script), gave up, and built myself a receipt tracker (photo + gemini based ocr) instead, which was way easier and is more reliable, even though - oh the horror! - it's using AI.

noufalibrahim 174 days ago

Isn't this suitable for a Bayesian classifier? Label some data (manual + automation using substrings) and use that to train a classifier and then it should be able to predict things for you fairly well.

There's a new feeling that I experience when using LLMs to do work. It's that every run, every commit has a financial cost (tokens). Claude code can write a nice commit message for me but it will cost me money. Alternatively, I can try to understand and write the message myself.

Perhaps the middle ground is to have the LLM write the classifier and then just use that on the exported bank statements.

senko 174 days ago

> Isn't this suitable for a Bayesian classifier? Label some data (manual + automation using substrings) and use that to train a classifier and then it should be able to predict things for you fairly well.

Sure, maybe, if I label enough data? At the number and variety of transactions I do, it wouldn't be be much better than hardcoding.

> It's that every run, every commit has a financial cost (tokens).

Ballpark figure for total cost for the Gemini OCRs for now (for me and a few hundred other people who have downloaded the app), for the past 6 or so months, is a few minutes of my hourly rate.

Absolutely not worth the manual grind for me.

ozim 174 days ago

Getting LLM to write the classifier should be the way to go.

That’s what I mostly do, I give it some examples ask to write code to handle stuff.

I don’t just dump data into LLM and ask for results, mostly because I don’t want to share all the data and I make up examples. But it also is much cheaper as once I have code to process data I don’t have to pay for processing besides what it costs to run that code on my machine.

sblom 163 days ago

That's a great idea. It's also what Tally does.

xtiansimon 174 days ago

> "Perhaps the middle ground is to have the LLM write the classifier..."

There was a time when I'd read this comment and then go looking for a tutorial on building a basic "Bayesian classifier". Invariably, I'd find several, which I'd line up like soldiers in tabs, and go through them until I find one that explained the what, why and how of it that spoke to my use (or near enough).

Of course, now ChatAI does all that for you in several Chat sessions. One does wonder though, if Chat is trained on text, and that was the history of what text was available, 10 years from now after everyone stopped writing 10 Blog posts about the same "Bayesian classifier", where's the ChatAI fodder coming from? I don't even know if this would be an outcome of fewer Blog posts [1]. It just strikes me as interesting because that would be _a very slow process_.

[1]: Not that this is necessarily true. People write blogs for all sorts of reasons, and having knockout quality competition from ChatAI does not KO all of them.

mimimi31 174 days ago

>Isn't this suitable for a Bayesian classifier?

I think that's what GnuCash does by default. Even with years of past transaction data it still gets some very obvious matches wrong for me. In my experience it's about 90% accurate for the ones it really should be able to do based on the training data.

xtiansimon 174 days ago

> "...it's about 90% accurate for the ones it really should be able to do based on the training data."

What's the pathway for the remaining 10%? Are they simply misclassified, and dropped into a queue for manual labeling? Do the outliers get managed by the GnuCash? Or do they get dumped into a misc 9000 account?

mimimi31 173 days ago

It shows you the automatic account matches on import, allowing you to double-check and correct any misclassified ones.

xtiansimon 173 days ago

Ok. So what you're pointing to is not an automated pipeline, but a user mediated process. It's the same pattern in QuickBooks, or whatever ERP.

yunohn 174 days ago

IMHO the better middle ground is to use a nice (potentially fine tuned) small model locally, of which there are many now thanks to Chinese AI firms.

ithkuil 174 days ago

An expensive model can generate the training dataset

conradev 174 days ago

You can likely just take a small open weight language model and use it like a classifier quite easily.

throwaw12 174 days ago

That dude is a Distinguished Engineer at Microsoft, doesn't need your "resume driven" label, his resume is good enough already.

Why don't you accept it as, dude is experimenting and learning new tool, how cool is that, if this is possible, what else can I build with these tools?

devsda 174 days ago

May be not resume driven. But hearing MS and AI, I can't help but wonder if this is result of one of those mandates by "leadership" where everyone is forced to come up with a AI use case or hack.

throwaw12 174 days ago

isn't this is exactly the point of innovation and mandates?

"leadership" or real leaders, want people to experiment a lot, so some of them will come up with novel ideas and either decide to build it on their own and get rich or build internally and make company rich.

Not always, but in many cases when someone becomes rich with innovation, it is probably because there was a benefit to a society (excluding gambling, porn, social media addictions)

stoneforger 174 days ago

Because there was a benefit for some shareholder somewhere, maybe.

bdcp 167 days ago

It's insane to expect them go rouge and not benefit the company in some sense

supriyo-biswas 174 days ago

The pressure at those levels is even higher, as it is an unsaid expectation of sorts that LLMs represent the cutting edge of technology, so principals/DEs must use it to show that they're on the top of the game.

andy99 174 days ago

No idea if this is true but very sad if it is. This is a great argument for the concept of tenure, so experts can work on what they as experts deem important instead of being subject to the whims of leadership. I, probably naively pictured Distinguished Engineer to be closer to that, but maybe not.

yolo3000 174 days ago

It's in the career framework of most big techs to use AI this year, so everyone is doing it to hold on to their bonuses.

RevEng 174 days ago

Sadly, yes, it's true. New AI projects are getting funded and existing non-AI projects are getting mothballed. It's very disruptive and yet another sign of the hype being a bubble. Companies are pivoting entirely to it and neglecting their core competencies.

throwaw12 174 days ago

fair, but it doesn't mean some of them are genuinely experimenting and figuring out interesting ways to use LLMs, some examples I personally love and admire

* simonw - Simon Willison, he could just continue building datasette or help Django, but he started exploring LLMs

* Armin Ronacher

* Steve Yegge

and many more

greatgib 174 days ago

Currently Microsoft is eliminating a lot of the useless fat in redundancy plans. So the crappy "resume driven" thinkg might be actually needed.

weird-eye-issue 174 days ago

That sounds exactly like the type of person that would care about their resume.

zwnow 174 days ago

Microsoft (the company with no noteworthy accomplishments within the past decades) is a metric for a resume being good now?

All they do is buy out companies and make a already finished product theirs.

bitwize 174 days ago

Oh he's from Microsoft? That makes malarkey like this track so much more.

dimitri-vs 174 days ago

In theory, yes. In practice the shit data you are working with (descriptions that are one or two words or the same word with ref id) really benefit from a) an agent that understands who you are and are likely spending money on b) has access to tool calls to dig deeper into what `01-03 PAYPAL TX REFL6RHB6O` actually is by cross referencing an export of PayPal transactions.

I think the smarter play is having an agent take the first crack at it, and build up a high confidence regex rule set. And then from there handle things that don't match and do periodic spot checks to maintain the rule set.

xtiansimon 174 days ago

> "...then from there handle things that don't match..."

Curious, what's the inputs for an agent when handling your dataset? What can you feed the agent so it can later "learn" from your _manual labeling_?

nerdzoid 174 days ago

I think you don't use UPI or you would have understood the painpoint categorizing even with ai would be difficult.

minne 174 days ago

The maker built NuGet. He don't need a resume.

pansa2 174 days ago

But can he invert a binary tree?

junto 167 days ago

Agreed but not just NuGet. David also was a key driver for SignalR and now Aspire (which is genuinely the most awesome tool I’ve seen for a while). He’s also extremely humble and doesn’t need to try to impress anyone.

I think it’s clear that he’s just doing this tool for fun and chose to share it. People shouldn’t mix up their anti-Microsoft autoeroticism and a person that happens to work for them.

dizhn 174 days ago

> The maker of this has taken a deterministic problem (substring matching transaction descriptions) that could be solved with a 50-line Python script

I had a coding agent write this for me last week. :D

It takes an excel export of my transactions that I have to download obviously since no bank is giving out API access to their customers. It uses some python pandas and excel stuff and streamlit to classify the transactions and "tally" results and show on the screen as color coded tabular data. (Streamlit seems really nice but super super limited in what it can do.) It also creates an excel file (with same color coding) so I can actually open it up and check if necessary. This excel file has dropdowns to reclassify rows. The final excel file also has formulas in place to update live. Code can also compare its own programmatic calculations with the result from formulas from excel. Why not? My little coding sweatshop never complains. (All with free models and clis by the way. I haven't had a reason to try Claude yet.)

xtiansimon 174 days ago

> "...no bank is giving out API access to their customers..."

I think Citi bank has an API (https://developer.citi.com/). Not that it's public for account holders, but third-parties can build on that. I'm looking at Plaid.com

One thing about Plaid--I've not been happy when encountering Plaid in the wild. For example, when a vendor directs me to use Plaid to validate a new bank account integration. I'd much rather wait a few days and use the 0.01 deposit route. Like using a katana for shaving.

But signing up to use Plaid for bank transaction ingestion via API is a whole different matter.

init 174 days ago

I've built and worked on this exact problem before at bigtech, startup and personal projects.

Regex works well if you have a very limited set of sender and recipient accounts that don't change often

Bayesian or DNN classifiers work well when you have labeled data.

LLMs work well when you have a lot of data from lots of accounts.

You can even combine these approaches for higher accuracy

hrimfaxi 174 days ago

It seems like using a model to create regexes that match your transactions might be worthwhile.

ManuelKiessling 174 days ago

Yeah, a pattern like „do the heavy lifting with cheap regexes, and every 100 line items, do one expensive LLM run comparing inputs, outputs, and existing regexes to fine-tune the regexes“.

qaboutthat 174 days ago

Charitably, this is a very naive take for unstructured bank / credit card transactions. Even if you use a paid service for elaboration you will not write a 50-line, or even 500-line, list of declarative rules to solve this problem.

conradev 174 days ago

I imagine a coding agent would be great at editing your regex file to maximize coverage.

Just like manually editing Sieve/Gmail filters: I want full determinism, but managing all of that determinism can be annoying…

makach 174 days ago

This is an actual hard problem he is trying to fix. 50-line python? Pfff..! My current personal 400-line rule script begs to differ not to mention the PAIN of continuously maintaining it. I was looking into using AI to solve the same problem but now I can just plug and play.

froggertoaster 167 days ago

Lmao I'm no David Fowler fan (leftist blowhard), but he's one of the most talented and successful engineers at Microsoft. I don't think he needs to build a resume.

andy99 174 days ago

And this feels like peak HN “why not just use a regex”.

This is a hard, for all intents and purposes non deterministic problem.

Now if you’ll excuse me I have to draft another post admonishing the use of

  curl | sh

zwnow 174 days ago

Can't wait for people building agent based grep or whatever else solved issues there are. AI people really need to touch grass.

alwillis 173 days ago

grep and ripgrep are used by Claude Code; I suspect other agents are doing something similar.

zwnow 173 days ago

Its not about whether the agent uses it but about some person building an agent based grep. It was a joke.

ezst 174 days ago

Dude, common, we won't be reaching "AGI" with that attitude. /s