Hacker News new | ask | show | jobs
by madenine 2376 days ago
SELECT TransactionID, Amount, COUNT(*) FROM Transactions GROUP BY TransactionID, Amount

I'll take 10% of savings

5 comments

This presumes all of the transaction data is centralized one place. 99% of the battle was probably getting that data aggregated, then performing the analytics, which is chalked up to this simple SELECT statement.
This is exactly the issue here. I worked as a contractor for the State Department for a couple of years on software that basically held the record of every dollar spent over the last 20 years.

What seemed like a very small and simple problem initially revealed itself to be massive problem that even a team of 35 struggled to maintain. The purchase history for any given item spanned multiple systems with completely different topologies of data glued together by, literally, tens of thousands of lines of SQL.

The project had been around for ~17 years when I worked on it and while data was landed in a final format where a query like the above could have been done, I wouldn't bet any serious money that the calculations were correct.

Some five or six multi-million dollar rewrites had been attempted, but could never be done.

Not to say Ohio's system is that this level, but I doubt there's a giant table sitting somewhere that such a simple query could be applied to.

I like to think that somewhere in all that SQL was an Office Space-esque line of SQL syphoning off a few cents per transaction, but was never able to find one :)

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

"Don't be snarky."

https://news.ycombinator.com/newsguidelines.html

HAVING COUNT(*) > 2;

I'll take 9% );

HAVING COUNT(*) >= 2;

Back to 10%

Add a date field or you might get legitimate duplicates if the payment amounts are the same across months.
They shouldn't have the same transactionID then, no?
Wouldn't think so. Each transaction ID would represent a unique transaction?

>Other double payments made by mistake included times that the state received multiple invoices.

I would assume each paid invoice was a new transaction ID. The real problem seems like there are two invoices being paid. Not that there are two transactions (which seems like just a symptom). It's possible that each invoice even has it's own ID.

`>=`

8%?

And when there are two payments with different transaction ids from the gateway?
most likely? my joke fails to pan out in reality
I would guess that "TransactionID" is where a lot of the challenge lies.