Interesting thought. We ran into the same issue working on a recommendation system engine as well and previously tried to build a solution around it. Curious to know what’s driving your interest in post-analysis?
Also, happy to share our learnings of working with a social media customer. For them, key motivation was to understand where the model is failing and hence, understand how to improve it. They started with offline experiments with focus on improving AUC but the graph saturates pretty quickly. While an incremental model improvement didn't lead to a sizeable change in offline metrics, it can impact retention of a certain user group and hence, overall revenue. They are using us to get insights on online experimentation. They have defined custom measures to monitor the distribution of model outputs and are able to determine the difference in model's performance much more effectively as compared to relying on changes in business metrics like retention, revenue, etc. This helps them to find out poor-performing cohorts and roll-out model improvements in weeks, not months. Would also love to hear what issues you ran into?
Oh, to be specific, the platform was oriented towards test practice and our objective was to recommend questions in a sequence.
One good strategy that correlated with session length for us was asking questions that were neither too difficult or easy, based on what we knew of the user's level at that instant. The post hoc analysis was really meant to dig into multiple user sessions and see if the current method was working and evaluate counterfactual strategies.
I imagine the Uptrain product could help us segment user cohorts, find out which ones aren't performing super well etc. Would love to hear what you ended up building too?