|
|
|
Ask HN: Keep Last X entries per ID in larger DB
|
|
2 points
by matttah
2703 days ago
|
|
I'm trying to figure out the best solution for the following: * Daily importing of ~500+ million rows of data, with ~250 million unique ids.
* I need to only keep the latest X entries per unique ID. Older entries are discarded after X entries for that id has been achieved.
* Monthly will read out the entire dataset for processing X can be anywhere from 1000 to 3000, it is static over the entire DB just depends on as we determine the best setting. Since I don't access the data more than once a day, or at the end of the month, I would prefer not to pay for storage. There are over a billion unique id's which I can partition by prefix or ranges. Each individual entry per ID is fairly small with only an integer and two decimals stored. What would you recommend as a data store for this? Thanks! |
|
Insert a fresh record immediately, since you know it’s recent. Upon successful insert, fire off a queue request to go check on that ID.