Hacker News new | ask | show | jobs
by kbennatti 1793 days ago
Ooh I love MTL. That's the first place I lived in Canada. Great q. We used SEC enforcement actions related to fraud as our gold label (fairly common practice in academia). Really key thing here is that you need to be careful about what years you're using for training because if you include years that are too late in the fraud cycle, you end up with significant target leakage e.g. the filings we'll say, "we're being investigated for fraud". We ended up manually reviewing all of our data/labels. It took over a month. We also use settled class action lawsuits as silver labels. Plus a few other more frequent labels as bronze labels
1 comments

Thanks for the quick answer! Follow-up around this as it's a space I'm actively working in: did you use or build any tools for the labeling process, or was it Excel? :D Also, do you ultimately see/position your solution as an AI-powered exploration tool that allows humans to derive better insights, faster (but where the NLP side of things is simply to assist in this discovery process), or do you see the models (and resulting flags) eventually being able to completely replace the human intuition?
Our annotation process has been manual so far, but we are working on building something to make it more efficient :-) We see our solution as an AI-powered assistant for qualitative research that makes the job of an analyst much easier, and don't see it as 'replacing' humans for the foreseeable future.