|
|
|
|
|
by cocoablazing
3011 days ago
|
|
What’s the advantage of the file system/repo/bespoke diag database over storing the numpy arrays in the existing database infrastructure? Doesn’t implementing this system with HDF5 cause headaches for concurrency in either direction? |
|
1) if you've got models that are re-generated periodically based on new inputs/algorithm tweaks, then you can potentially end up with quite a few of these as you scale.
2) if you want to track the details that/debug the reason your production system made a given decision, you need to log not just your model but all of the parameters that went into that decision. If that type of decision happens many times a day, then you can end up with some pretty massive logs to go through.
In either case, storing that historical data in your transactional database can be a bit of a load, so it's ideal to keep it separate if you get any kind of volume. I've actually bumped into 2) at one job.