| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by doctorpangloss 924 days ago

> With that data, you can extract “ground truth” by looking at the draft picks made by the best players on the service (sorted by win rate).

Do you mean that you are looking at the draft picks from https://www.17lands.com/leaderboard and then sorting by Win Rate? Didn't you mean to choose Match Wins or Trophies? Otherwise, you're not measuring the best players on the service. You're training on draft choices where most choices were very good - i.e., win rate sort will show you the luckiest players, not the best ones. That will naturally show up in any validation or testing you do too.

Shouldn't this be compared not to an LLM baseline, but to a baseline where an "Elo" style score is computed for each card compared to others from the 17lands data; then, until you have two colors, suggest the best scoring card, or when you do have color(s), suggest the best scoring card within that color or a land?

I think it is possible for the LLM to have some semblance of rules knowledge, but it is more likely that it is picking up on card rarity, costs and "Big" more than anything else for unseen cards.

Your "accuracy" on the draft seems poor. I'm not sure it means what you think it means. Are you saying that when looking at the high win rate choices, where all the choices were mostly good, you happened to pick the choice that isn't the same as the player who originated the data? It actually seems harder to make a choice among all good choices.

Anyway, there is quite a bit going on here.

1 comments

dmakian 924 days ago

> Do you mean that you are looking at the draft picks from https://www.17lands.com/leaderboard and then sorting by Win Rate? Didn't you mean to choose Match Wins or Trophies? Otherwise, you're not measuring the best players on the service. You're training on draft choices where most choices were very good - i.e., win rate sort will show you the luckiest players, not the best ones. That will naturally show up in any validation or testing you do too.

Ahh no just unclear in the post, I'm filtering to players in 17lands with a > 62% match win rate who are drafting at a high ranking (>=diamond rank). I look at all of those players' drafts though, even the ones where they do poorly.

> Your "accuracy" on the draft seems poor. I'm not sure it means what you think it means. Are you saying that when looking at the high win rate choices, where all the choices were mostly good, you happened to pick the choice that isn't the same as the player who originated the data? It actually seems harder to make a choice among all good choices.

Accuracy here is making the same choice from a given pack as one of the good players. Obviously subjective so not a perfect metric, but a decent check on ability to emulate a high-quality drafter.

link

Palmik 924 days ago

In ELO like match-making, you typically pair together people such that they are likely to have 50% chance to win. Therefore as the OP says, filtering down to people with high (60+%) life-time win-rate creates some sort of (interesting) bias.

I would select from all games played on sufficiently high level.

link

pclmulqdq 924 days ago

They don't fully use Elo for matchmaking. There's a league system, and you get matched with players in your league. The ranks reset frequently, too.

Edit - I did the math. From the data on the MTG Elo Project, top Magic players have about a 70-75% game win percentage over an average tournament player. They have the top player at ~2300 Elo with the average being around 1500 (in matches), and have scaled the Elo system so that a 200 point gap is a 60% chance to win a best-of-three match (this is NOT the same as Chess Elo scoring).

link

doctorpangloss 924 days ago

Hmm, but that will filter out more than half the players on the Match Wins and Trophies based leaderboards, many of them Diamond and Mythic. So I think your choice of 62% match win rate is almost certainly disproportionately selecting for people who received very good draft choices, even if it includes some actually very good players in the data set.

I mean 62% might feel like a good number, but it's arbitrary, you'd have to justify how you chose it, and just eyeballing it, it is filtering out a lot of very good players with many, many more match wins.

Perhaps you can sort by Latest Rank, and filter out people with 2 or fewer trophies. Or you will have to validate with known bad draft choices in the prompt, to see what it does. Suffice it to say, I still don't think the 17Lands data represents what you think it does.

Like without a direct discussion about measuring and accounting for luck in the draft... for all I know the data is seriously flawed. It probably isn't, but it's maybe one of many, many issues to address when dealing with strategy card game AI problems.

link

dmakian 924 days ago

Still not clear maybe, I'm selecting players with a 62% lifetime win rate so mostly players who have been good over a larger number of drafts!

Definitely not perfect data though, and agree that defining good in this context is hard -- a lot of the variance of "good" depends on how you play the cards either way. All good points!

link

doctorpangloss 924 days ago

> I'm selecting players with a 62% lifetime win rate so mostly players who have been good over a larger number of drafts!

Hmm, but there are a lot of players with greater than a 62% lifetime win rate with very few drafts, but there may be many of those players... do you see? The win rate isn't a good filter. You chose it, you are trying to justify it, and I'm not convinced, not without the hard numbers.

I'm not confused about what filter you chose. I just think it's a bad filter, and you haven't thought very deeply about how it affects the data, which includes presumably your test and validation data - however you're choosing to test and validate, apparently by hand, by some eyeballed examples.

Anyway I think you have to compare with a non-LLM, non-random baseline to have any sense if this stuff is working at all. I could be dead wrong. I would maybe compare with a community draft picker.

link

sdwr 924 days ago

A lot of words to say "good players can't actually be good"

link

donpark 924 days ago

Data selection depends the use-case. Two contrasting use-cases I see are:

- Emulation

- Advisor

In case of MTG player emulation for example, I think it makes sense to group data by some rankable criteria like winrate to train rank-specific models that can mimic players of each rank.

link