Hacker News new | ask | show | jobs
by barkteryx 3078 days ago
This reads as an amazing journey. Kudos for your pursuit of a better process.

It seems to me that so many (online) courses jump to applying tf/pytorch to a predefined dataset, whereas most of the work is in preparing the data. I have a personal project I'd like to try out classifying images, and haven't had much luck finding resources on building my own training dataset.

Can you recommend any resources on assembling and collating your own dataset?

1 comments

It should be noted that I deal primarily with geo-spatial image analysis, so there is a not insignificant amount of bias with regards to what data I'm interested in. I like using the USDA NAIP API for imagery, since I can call in imagery using GDAL directly into python or R. I rely heavily on freely available public utility data sets (Parcel level utility data). Beyond that and other than as a starting point, you're training data is always going to be something you've invested in heavily. Good training data is 100% the game. No modeling exercise is going to go well on poor quality training data. Currently as a personal project, I'm trying to develop a platform for developing and training data for geospatial modeling. If you're interested, hit me up on a PM and I can explain it in more detail (after work).
PM OTW...
Not sure if there are PM's in HN? PatientPolly@forward.cat otherwise.