|
|
|
|
|
by tiburon
2346 days ago
|
|
@PaulHoule i've seen in the conclusion of your research that you are pointing to classifying to the content of those webpages behind the links, so I guess you are working on it. I think there will be a great improvement on how the classifier works if you have more content to analyse. |
|
Here is the progress I've made since then.
After I did that project I spent a year working on text analysis tools for somebody else. Then I was looking for a new job and I made a new version of that software to scrape 1000's of job listings and do a similar classification based on the whole text of job listings which are usually a few paragraphs.
That software has a much better user interface than the old software for adding labels and it's designed to handle "workflow" tasks that have some human and some automated elements.
If I do more work in this area I will probably build on that code. Personally I think the framework for getting training data and putting the model to work is more important than the model itself. (That said, with a good document embedding I think you could get good results with less training data)