|
|
|
|
|
by tominous
430 days ago
|
|
In my case I had hundreds of invoices in a not-very-consistent PDF format which I had contemporaneously tracked in spreadsheets. After data extraction (pdftotext + OpenAI API), I cross-checked against the spreadsheets, and for any discrepancies I reviewed the original PDFs and old bank statements. The main issue I had was it was surprisingly hard to get the model to consistently strip commas from dollar values, which broke the csv output I asked for. I gave up on prompt engineering it to perfection, and just looped around it with a regex check. Otherwise, accuracy was extremely good and it surfaced a few errors in my spreadsheets over the years. |
|
Everyone has a story of a csv formatting nightmare