| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alanbernstein 2500 days ago
	I'm the author of the post. I've wanted a tool like this for a while, and I coincidentally discovered the SMOTE paper recently. It's simple enough to throw together a prototype in a few hours, and it requires very little understanding of the data set. I was looking for something with a certain balance between speed/effort and statistical robustness. I wanted a big data set for testing pilosa performance, not for training ML models, or anything that really cares about the statistics. However, hundreds of repeated records can make histograms look glitchy, so I wanted to avoid that naive approach. Something like SMOTE fit that need well.

1 comments

juandes 2493 days ago

I agree with you. I have a bit of experience using SMOTE and one of the things that make me keep using it is its simplicity, and how versatile it is. Just like you, a couple of days ago I wrote a small prototype on how to balance an already synthetic dataset and was very, very satisfied with the results. I'll share it with you in case you are interested,

https://kite.com/blog/python/smote-python-imbalanced-learn-f...

link