Hacker News new | ask | show | jobs
by ariel-faigon 3600 days ago
Thanks so much for all the excellent comments. There was definitely an over-fit with 4-passes.

No more. I've updated the Makefile to run only one pass, changed the options so it runs with older-version vw, Fixed misspellings of 'gioza', removed 'mayo' which found itself on the wrong side because it appeared only twice and always alongside the bun and regenerated the chart.

All the main conclusions remain intact.

In the end, I urge everyone to use their own data, that was the main purpose of sharing this code. My data-set is small, awfully noisy and insufficient. There are no p-values and no rigorous statistics, so please don't read too much into the minute details. It is the discovery journey into the top factors that is the important part, in my view. The ML was just one aid in this discovery process. The proof for me was my actual, and sustainable, weight loss that came after (very slowly) realizing the top factors that eventually worked for me. Thanks again.

1 comments

I don't think it matters whether you run 4 passes or 1 pass, it's still going to overfit. You can run an online linear regression in a single pass too, but that doesn't magick away the uncertainties. The results are still going to be garbage, and any effects you get are due to your health-consciousness and not any specific dietary choices you make (how could it be, when the data is so weak and noisy that each item can easily flip signs?).
Thanks so much. Your comments are really helpful.

I realized early on that the data is hopelessly noisy, due to the small daily changes and the scales resolution so rather than trying to build a perfect model to gauge the variable importance of each and every kind of food, I focused on the few days when weight change was more significant hoping I could detect some signal in those, and extrapolate and further explore from that. That's why I sorted the data-set by abs(delta) and that's what consistently pointed me towards sleep/fasting as the #1 factor. I do agree that the full list/model is garbage in the sense that probably 80% or so of it is woefully inaccurate/flipped, noisy, overfitted etc. The main point was to lead me in the right direction by looking at the big picture and what stood out.

And what stood out were 2 things 1) sleep (fasting duration), and 2) fat vs carbs. I think everything else should be ignored. I think we're in total agreement on this point.

Does this sound more sensible to you?

Is it possible weight loss made you sleepy?