| > With that data, you can extract “ground truth” by looking at the draft picks made by the best players on the service (sorted by win rate). Do you mean that you are looking at the draft picks from https://www.17lands.com/leaderboard and then sorting by Win Rate? Didn't you mean to choose Match Wins or Trophies? Otherwise, you're not measuring the best players on the service. You're training on draft choices where most choices were very good - i.e., win rate sort will show you the luckiest players, not the best ones. That will naturally show up in any validation or testing you do too. Shouldn't this be compared not to an LLM baseline, but to a baseline where an "Elo" style score is computed for each card compared to others from the 17lands data; then, until you have two colors, suggest the best scoring card, or when you do have color(s), suggest the best scoring card within that color or a land? I think it is possible for the LLM to have some semblance of rules knowledge, but it is more likely that it is picking up on card rarity, costs and "Big" more than anything else for unseen cards. Your "accuracy" on the draft seems poor. I'm not sure it means what you think it means. Are you saying that when looking at the high win rate choices, where all the choices were mostly good, you happened to pick the choice that isn't the same as the player who originated the data? It actually seems harder to make a choice among all good choices. Anyway, there is quite a bit going on here. |
Ahh no just unclear in the post, I'm filtering to players in 17lands with a > 62% match win rate who are drafting at a high ranking (>=diamond rank). I look at all of those players' drafts though, even the ones where they do poorly.
> Your "accuracy" on the draft seems poor. I'm not sure it means what you think it means. Are you saying that when looking at the high win rate choices, where all the choices were mostly good, you happened to pick the choice that isn't the same as the player who originated the data? It actually seems harder to make a choice among all good choices.
Accuracy here is making the same choice from a given pack as one of the good players. Obviously subjective so not a perfect metric, but a decent check on ability to emulate a high-quality drafter.