Hacker News new | ask | show | jobs
by paulfr 4110 days ago
Just 2 weeks ago, I won a machine learning contest with a $20,000 prize pool where the goal was to predict the IQ of a child at age 7, based on various biological measurements and demographic indicators. The data includes whether a child was breastfed.

After reading the article I did some very quick computations and my finding based on the model I developed is that children who were not breastfed have an IQ impairment in the 1-3 IQ points range, after accounting for confounding factors.

This is very consistent with the results published here.

Compared to this study, I believe the methodology I'm using is more powerful for three reasons:

- much larger sample size: the dataset I have access to comprises 12015 children, compared to 3493 for the study

- a larger set of confounding factors is accounted for: notably, the data includes height and weight measurements at up to five points in time

- confounding factors are fully accounted for, rather than hand-waved away. This is a complex model based on random forests and linear model, and the results are entirely cross-validated.

Stay tuned for more detailed computations. I will also ask the organizers for the exact definition of breastfeeding used.

---

Edit:

On the other hand, the study is still very appealing because according to the authors, there is little correlation between demographics and breastfeeding in Brazil, whereas the validity of the effect I'm reporting is dependent on whether the demographic model is powerful enough to remove the correlation. Still, I believe most of the correlation is easy to remove, and it isn't clear that there aren't subtle demographic effects even in Brazil. In particular, the proportion of participants with missing IQ data seems to decrease with duration of breastfeeding, and I don't know if they have an explanation for that.

6 comments

I can't edit any more, so I'm posting as a reply.

I compared my dataset and their data, and it turns out that even in Brazil there is significant correlation between mother education and breastfeeding, barely less so than on the dataset I have. So you should probably disregard my edit in the above post: accounting for confounders could be important in both cases.

Thank you for taking the time to comment about your work.

I came to comment on the headline, basically to say that "Breastfeeding 'linked to higher IQ'" sounds awfully backwards -- surely this isn't really a result "in favour of" breastfeeding, but rather pretty damning evidence against formulas/substitutes ?

I like your formulation much better:

> children who were not breastfed have an IQ impairment

I see the article climbing back up the front page, and I want to make it abundantly clear that I don't believe that there is necessarily a causal relation from breastfeeding vs formula to IQ.

I could control for demographic factors, but not for other major important factors in a child's development -- parental IQ or views regarding parenting are huge factors in a child's IQ, and they may also influence the decision of whether or not to breastfeed. If parenting books unanimously decided that formula was bad, then parents who cared enough about their children to read them and follow their advice would be more likely to breastfeed and you would see a positive correlation with IQ, even if formula was completely equivalent to breastfeeding.

So while there is a clear correlation after controlling for demographic, there is not necessarily a causation. My formulation really wasn't meant to imply causation, but rather the handling of unknown data (I grouped unknown-status children together with breastfed children). When talking about correlation, it's not meaningful to make a distinction between IQ impairment and IQ gain.

I really think this shouldn't get this much attention without double blind trials. Specially for a difference as marginal as 1-3 IQ points.
Even if we ignore the issue of residual confounding for a moment, how is one IQ point "damning"?
> I won a machine learning contest with a $20,000 prize pool

Who organized that contest ?

The contest was hosted on TopCoder. The name of the organization who provided the data and funded it is not public, but I'm told it will be made public at some point.

http://community.topcoder.com/longcontest/?module=ViewProble...

Looks like there are some more TopCoders here :)

I ended up 5th, and yeah - the demographic variables are by far the most importants.

As Buffett used to say - it's an 'ovarian lottery' and you better have luck at it.

> impairment in the 1-3 IQ points range, after accounting for confounding factors.

But 1-3 IQ points doesn't seem much, right? I mean, what difference would it make in real life activities?

It's roughly the same as the change that happened when we stopped using leaded gas, which seems to have had a huge affect on society. http://en.wikipedia.org/wiki/Tetraethyllead#Toxicity

Also it's just one relatively simple thing. Imagine we find several things that can all increase IQ by a couple points...

I've never gotten my IQ tested, but when I read about studies like these - they usually demoralize me from fear of competitive behavior. I think this leads to low self esteem, and the imagining that I have a low IQ (high marks through schooling and post graduate education, MSc, possible imposter syndrome) as my default attitude. It can be a self defeating attitude, but what it usually does is make me argue with people about IQ tests, and explain why they are not necessarily indicative of intelligence, individually or globally.

I can imagine that merely the act of measuring IQ has significant effect on the population. I can not imagine a population that exists without it, but I imagine it would also have a huge effect on society, as you similarly hypothesize about the 1-3 point increase across the population globally.

This, but also, IQ isn't an exact score, right? Wouldn't this fit snugly within the margin of error?

Not only that, but are people even still using IQ as a metric of intelligence? As I understand it IQ is only a measure of one type of intelligence. I mean I'm all for giving a child every advantage you can fathom as a parent, so I think this is still kind of cool research, but I don't see how it equates to anything truly meaningful.

It kind of just adds one more point in the "breastfeeding is good" column.

1-3 IQ points can mean the difference between living on Hamburger Helper in a tenement, and living on fresh lobster on a 300 foot yacht.

I'm kidding, btw.

Wow, that's very interesting work. I'd love to see a write-up -- not just with respect to breastfeeding, but generally on your findings.
I found a big improvement to my model just after the contest was over, so I made a proposal to develop the improved model and do some more in-depth analysis. I'm currently waiting for their reply!

If they accept I'll have an opportunity to look deeper -- it's one thing to develop an efficient model, but fully exploiting it in order to gain a better understanding of the data takes some work. A limitation of these contests is that you're rewarded for producing a very efficient model, but there is very little emphasis on analysis of your model once you built it. I think it's a shame, because the person who built the model is often in the best position to have a good intuition of both the dataset and why the model had to be built that way.

I've been considering opening a blog, but I haven't found time to do so yet.

Briefly, the purpose of the contest wasn't to understand the effect of breastfeeding, but to understand how important normal child growth is to mental development. They included several scenarios: with all data available, with demographic data removed, and with demographic data and growth curves removed. Unfortunately, IQ is so overwhemingly affected by demographic that the scenarios without demographic data devolved into a game of extracting all the demographic data that was leaked by non-demographic variables. And when demographic data is available, more than 90% of the variance extracted by the model comes from demographic data rather than biological measurements!

It's really disheartening to think that depending on the social setting you come from, you start with an IQ of 85 or 115 -- at age 7...

Pardon a possibly dumb question, but what are kinds of data are considered "demographic" in your specific case? Can you give some examples?