Wordbank: An open database of children's vocabulary development

Y	Hacker News new \| ask \| show \| jobs

	Wordbank: An open database of children's vocabulary development (wordbank.stanford.edu)
	164 points by Jasamba 3405 days ago

6 comments

aidos 3405 days ago

My wife is a speech and language therapist for under 5s and we have 2 children under 5 ourselves. There are so many techniques she uses without even thinking about it to encourage communication that, left on my own, I would never have known to use. For example, in the really early days, when a child said "blah blah blah blah" I would have been inclined to repeat it, but now I'll say "that's right, an aeroplane!" (Or whatever it is).

Parent-child interaction goes a really long way in child development and if you ever get the chance, it's worth sitting in on a session (whether your child needs extra help or not). A large part of the work my wife does is around enabling parents to assist kids that need more input (through no fault of the parents themselves).

link

vsviridov 3405 days ago

Expected the walrus to be the logo due to an obscure meme. Was not disappointed.

link

100k 3405 days ago

If others are curious: http://knowyourmeme.com/memes/wordbank-walrus

In a vocabulary quiz, a kid interpreted the header "wordbank" as one of the choices.

link

minimaxir 3405 days ago

Looks like the interactivity is running in R Shiny; and I'm hitting "License Quota Reached" errors.

link

mcfrank 3405 days ago

Yes, thanks for your interest - we only have 50 concurrent users licensed. Never thought we'd get this much interest. :)

link

rspeer 3405 days ago

Would it be possible to mirror just the data somewhere else, such as S3?

I don't need the R code, but this sounds like it would make good companion data to my own wordfreq [1]. It would be interesting to see which words are learned early but relatively uncommon in corpora, and generally to be able to measure differences in register between child and adult language.

[1] https://github.com/LuminosoInsight/wordfreq

link

mcfrank 3405 days ago

Very cool!

All our code is at http://github.com/langcog/wordbank and you can access the database directly using the wordbankr R package (on cran).

A paper doing something similar to what you describe is in prep, with a conference version here:

http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

link

mikabr 3405 days ago

we have an R package which you can use to access the data: https://github.com/langcog/wordbankr

we've done some analyses predicting words' learnability from frequency and other factors: http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

link

euph0ria 3405 days ago

Comparing Swedish/Danish/English in the vocab:

http://wordbank.stanford.edu/analyses?name=vocab_norms

Seems like data sample is too small to infer anything useful for Swedish.. but comparing Danish and English is interesting. Seems like Danes outperform or English kids underperform. Would be interesting to understand what the major driver is for the effect.

link

mcfrank 3405 days ago

Hi, original researcher here - these forms have different words and different numbers of items, so it's complicated to compare absolute proportions across languages. I wouldn't infer that there are differences between populations from differences in the norms.

link

euph0ria 3404 days ago

Thank you! Appreciate the answer!

link

mistermann 3405 days ago

Semi-related question: if anyone knows of something similar for physics or chemistry (for a bit older kids) would appreciate it!

link

isanganak 3405 days ago

Funny how it says English and English(British) in those coloured bubbles :)

link