Hacker News new | ask | show | jobs
by rspeer 3405 days ago
Would it be possible to mirror just the data somewhere else, such as S3?

I don't need the R code, but this sounds like it would make good companion data to my own wordfreq [1]. It would be interesting to see which words are learned early but relatively uncommon in corpora, and generally to be able to measure differences in register between child and adult language.

[1] https://github.com/LuminosoInsight/wordfreq

2 comments

Very cool!

All our code is at http://github.com/langcog/wordbank and you can access the database directly using the wordbankr R package (on cran).

A paper doing something similar to what you describe is in prep, with a conference version here:

http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

we have an R package which you can use to access the data: https://github.com/langcog/wordbankr

we've done some analyses predicting words' learnability from frequency and other factors: http://langcog.stanford.edu/papers_new/braginsky-2016-underr...