| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rjdagost 1343 days ago
	If there's one thing I learned with biomedical data modeling and machine learning, it's that "it's complicated". For biomedical scenarios, getting more data is often not simple at all. This is especially the case for rare diseases. For areas like drug discovery, getting a single new data point (for example, the effect of a drug candidate in human clinical settings) may require a huge expenditure of time and money. Biomedical results are often plagued with confounding variables, hidden and invisible, and simply adding in more data without detection and consideration of these bias sources can be disastrous. For example, measurements from lab #1 may show persistent errors not present in lab #2, and simply adding in more data blindly from lab #1 can make for worse models. My conclusion is that you really need domain knowledge to know if you're fooling yourself with your great-looking modeling results. There's no simple statistical test to tell you if your data is acceptable or not.