|
|
|
|
|
by HDMI_Cable
314 days ago
|
|
| The main challenge in building content-aware memory models is lack of data. To my knowledge, no publicly available dataset exists that contains real-world usage data with both card textual content and review histories. I wonder if the author has ever considered reaching out to makers of Anki decks used by premeds and medical students like the AnKing [1]. They create Anki decks for users studying the MCAT and various Med School curricula, so have a) relatively stable deck content (which is very well annotated and contains lots of key words that would make semantic grouping quite easy) b) probably contains loads of statistics on user reviews (since they have an Anki addon that sends telemetry to their team to make the decks better IIRC), and c) contains incredibly disparate information (all the way from high-school physics to neurochemistry). --- [1]: https://www.theanking.com |
|