Hacker News new | ask | show | jobs
Wordbank: An open database of children's vocabulary development (wordbank.stanford.edu)
164 points by Jasamba 3405 days ago
6 comments

My wife is a speech and language therapist for under 5s and we have 2 children under 5 ourselves. There are so many techniques she uses without even thinking about it to encourage communication that, left on my own, I would never have known to use. For example, in the really early days, when a child said "blah blah blah blah" I would have been inclined to repeat it, but now I'll say "that's right, an aeroplane!" (Or whatever it is).

Parent-child interaction goes a really long way in child development and if you ever get the chance, it's worth sitting in on a session (whether your child needs extra help or not). A large part of the work my wife does is around enabling parents to assist kids that need more input (through no fault of the parents themselves).

Expected the walrus to be the logo due to an obscure meme. Was not disappointed.
If others are curious: http://knowyourmeme.com/memes/wordbank-walrus

In a vocabulary quiz, a kid interpreted the header "wordbank" as one of the choices.

Looks like the interactivity is running in R Shiny; and I'm hitting "License Quota Reached" errors.
Yes, thanks for your interest - we only have 50 concurrent users licensed. Never thought we'd get this much interest. :)
Would it be possible to mirror just the data somewhere else, such as S3?

I don't need the R code, but this sounds like it would make good companion data to my own wordfreq [1]. It would be interesting to see which words are learned early but relatively uncommon in corpora, and generally to be able to measure differences in register between child and adult language.

[1] https://github.com/LuminosoInsight/wordfreq

Very cool!

All our code is at http://github.com/langcog/wordbank and you can access the database directly using the wordbankr R package (on cran).

A paper doing something similar to what you describe is in prep, with a conference version here:

http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

we have an R package which you can use to access the data: https://github.com/langcog/wordbankr

we've done some analyses predicting words' learnability from frequency and other factors: http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

Comparing Swedish/Danish/English in the vocab:

http://wordbank.stanford.edu/analyses?name=vocab_norms

Seems like data sample is too small to infer anything useful for Swedish.. but comparing Danish and English is interesting. Seems like Danes outperform or English kids underperform. Would be interesting to understand what the major driver is for the effect.

Hi, original researcher here - these forms have different words and different numbers of items, so it's complicated to compare absolute proportions across languages. I wouldn't infer that there are differences between populations from differences in the norms.
Thank you! Appreciate the answer!
Semi-related question: if anyone knows of something similar for physics or chemistry (for a bit older kids) would appreciate it!
Funny how it says English and English(British) in those coloured bubbles :)