| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mikepurvis 456 days ago
	Maybe? If the dataset is large and the stakes are low, maybe you just drop the affected records, or mark them as incomplete somehow. Or generate a failures spool on the side for manual review after the fact. Certainly in a lot of research settings it could be enough to just call out that 3% of your input records had to be excluded due to data validation issues, and then move on with whatever the analysis is. It's not usually realistic to force your data source into compliance, nor is manually fixing it in between typically a worthwhile pursuit either.