| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CaptainJack 468 days ago

I've used beancount extensively, spent many hours a few years ago. Built importers parsing bank PDFs (in UK, plaid doesn't work. Plus I'd rather also keep all the original statement PDFs).

Probably built 10+ importers, plus some plugins to do automated transaction annotations.

I have not made any update for many years now, because: - Downloading statements is still a pain, have to manually go through all websites. Banks are bad at making the statements available, and worse making it possible to automate it. - The root of the issue is actually that beancount is too slow. Any change/update takes ages. Python is both a blessing (makes it easy to add plugins/importers etc), and a curse (way slower than some other languages.

I believe the creator of beancount has started working on v3 with a mix of C++/python, relying on protobufs, a C++ core for parsing, etc. AFAIK, that is not production-ready yet.

6 comments

chrislloyd 468 days ago

I have a very similar setup but with HLedger[1]. A "do-nothing"[2] script helps me download statements by opening bank websites, waits for manual import and finally checks balances. That makes it a lot less repetitive and error prone. Or at least, I catch the errors faster.

I've found HLedger and Shake to be fast enough to process almost a decade of finances. Dmitry Astapov has an extremely well produced tutorial workflow[3].

How have you managed the PDF parsing? Mine has become a bit of a mess dealing with slight variations in formatting as they change over time. I've been considering using LLMs but have been nervous about quality.

[1]: https://hledger.org [2]: https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-... [3]: https://github.com/adept/full-fledged-hledger

Karrot_Kream 468 days ago

Why not spot check your PDF LLM outputs? I always make sure my accounts balance by hand anyway. Though Occasionally it's really painful especially if it's a missing Venmo transaction. It's rare that I need to really comb through my accounts to account for some money but when I do it's really time-consuming.

tonyedgecombe 467 days ago

Why don’t you import CSV files rather than PDF’s?

faustlast 467 days ago

I also use ledger/hledger to process a decade of finances. I reconcile once a year when doing taxes. I have multiple python scripts orchestrated with org-mode to generate reports/plots. I run them in separate processes since they are independent, which makes it fast enough (seconds).

What is Shake?

FredPret 468 days ago

I suspect Python isn't the limiting factor here - it's the file format. You can end up with huge interconnected text files that have to be fully parsed on every change.

If you have 1e5 - 1e6 of lines of transactions, I think a SQLite database would be a huge step forward. If you have much more than that, you probably need an ERP system.

Of course the text files make it ~easy to enter transactions, but maybe there's an elegant way to use those for ingestion only; that does make the system much more complicated to use. That might not be a problem for the kind of person using plain-text accounting over the course of years though.

mtlynch 468 days ago

v3 is out now and v2 is officially deprecated:

https://groups.google.com/g/beancount/c/iTdRuvZnE4E

I found the migration pretty confusing and haven't found good documentation on how to go from v2 to v3.

The best I've found is this unofficial write-up from an experienced Beancount user:

https://sgoel.dev/posts/moving-from-beancount-2x-to-3x/

CER10TY 468 days ago

As far as I can tell this is without the planned C++ rewrite though, and the documentation at https://beancount.github.io/ still says to use v2.

Is there a point in migrating already?

mtlynch 468 days ago

I'm still waiting on better migration instructions.

The maintainer says here that v2 is officially deprecated:

>You should not use v2 anymore.

https://groups.google.com/g/beancount/c/iTdRuvZnE4E/m/o9V91W...

diftraku 468 days ago

I'd be really curious on how hard programmatic access to your own, personal banking data might be in the PSD2-era.

I can link my secondary bank account to my main bank's app so I can see the balance in one place, but the catch is that I need to refresh this authorization through the app every 90 days.

Ideally, you'd just use your banking credentials to authorise the API access and pull data through that. What this requires in practice, I have no idea but it probably involves a bit of bureucracy.

Nextgrid 468 days ago

Some modern banks (Monzo, Starling, etc) give the account holder (read-only) access to their API.

If you can't, you can try use one of the open banking providers such as TrueLayer, Plaid, Nordigen (seems to be acquired by GoCardless: https://gocardless.com/bank-account-data/), etc. Most have a free/dev tier that nevertheless allows connections to real accounts and might be enough for personal use.

Finally, screen-scraping is potentially an option. One of the few benefits of shifting everything to SPAs is that you generally have clean JSON APIs under the hood that are easier to interface with than "conventional" screen-scraping involving parsing HTML.

jazzyjackson 468 days ago

Ran into this annoyance recently setting up new accounting software, that the access my bank provides is last 6 months only, so I still had to go and export a csv, rejigger the column names and date format, to reimport the first 8 months of 2024.

My thought for working around tracking new transactions without a third party is to just set up email alerts so I get a notification on every charge, deposit etc and set up some cron job to read new emails and update my books.

BeetleB 468 days ago

> Downloading statements is still a pain, have to manually go through all websites.

Have you considered using Playwright?

I used aider[0] recently to log into my work's payslips and download all the relevant payslips into JSON format (with values encrypted). It took about 3 hours, but that's mostly because of my lack of knowledge of good CSS selectors.

jxjnskkzxxhx 468 days ago

Banks in the UK allow to export transactions in many formats. Login, pick time range, download in ofx format. Why is this a pain?

erikerikson 468 days ago

It makes it about them not about you. I don't care which banks and other financial providers I use. I care about managing my funds in a way that is efficient and healthy for my life. The banks I use are simply service providers, a subclass of service providers across all the dimensions of my life. They have regulations they must abide by but in so doing they attempt to force me to think and act in those terms and I think they're poor.

jxjnskkzxxhx 468 days ago

The fact that I can download my data describing my transactions in a format convenient to me, makes it about them? Curious take.

erikerikson 468 days ago

You're justified in writing that mine is a weird (I know you wrote curious but I'll go further) take. I expanded my thinking beyond the scope you referenced but at least in part, yes. I have to log in to their interface, figure out and drill into where they offer that download. I don't want their branding and I don't want to manage my money in the way they constrain me to. I don't really want multiple accounts or to play interest maximization games by acting fiddley for example. At the highest level, I think about my flows in percentage allocations. 10% for charity, 10% for my kid's college (until there was enough), 10% for savings, so forth. As presented I'm constantly fiddling with specific money amounts for automation and stuck in the ledger rather than with sources and sinks with tagging based allocation and management policies. I'm ignoring the horsecrap around the ways they bilk the poor which is a separate kind of evil.

cranky908canuck 468 days ago

"banks allowing export of transactions" is only the start.

I deal with two banks for credit cards.

One (call it "Blue Bank") allows me to download a statement. I filter out a couple of things (payments mostly), check that it matches the paper statement balance, and post it. About 15 minutes start to finish.

The other (call it "Orange Bank") allows me to download a "statement". I filter out a couple of things (payments mostly), check my previous month's transactions to see which ones at the beginning of the file actually go in the current billing period (not already paid), stare at the last transactions to see which ones actually were posted to the current billing period (not after the cutoff), run the script to check the total (nope, doesn't match) then do that a couple of times until it matches. The time they changed the meaning of the "credit" column from "just confirming this is a credit" to "it's a credit, you need to flip the sign" it was 45 minutes.

But hey, it's all CSV!

jxjnskkzxxhx 468 days ago

I guess you must have a more complex life than me. I never filter out anything, and everything always matches.

cranky908canuck 468 days ago

Maybe. What I was trying to get at was, some banks (the 'orange one') don't provide sane semantics, so even if the input format is compatible, reconciliation can be a nightmare. You may not be dealing with your 'orange bank'; if I only dealt with the blue one I would not be aware of the problems of the other (and it would not have occurred to me that the orange one could botch it up).

BeetleB 468 days ago

Multiple bank accounts and multiple credit cards. Also, figuring out the time range for each bank.

mgr86 468 days ago

I run into the same issues here with banks in the US. It is a real pain in the ass, and makes tracking this sort of information way more time consuming then it needs to be.

My other issue is with stores like Costco that sell both household goods, groceries, clothes, and even misc kids stuff. I like to track each separately. Which means I then need to fetch and analyze the receipts.

BeetleB 468 days ago

> I like to track each separately. Which means I then need to fetch and analyze the receipts.

That is a reality. To make my life easier, when I check out at a store, I put all my grocery items first on the belt. Then everything else. Usually "everything else" is only a few items. So I categorize those additional items, and then specify "Groceries" for the rest.

Often I buy only groceries, and I throw those receipts away. When I'm in a ledger/beancount session, if I don't have a receipt, that means it was just Groceries.

This method alone really reduced my time dealing with receipts.

cranky908canuck 468 days ago

For planning purposes, could you look at a year's postings, then come up with "good enough" breakout allocations going forward?