Hacker News new | ask | show | jobs
by jeremyarussell 5197 days ago
In the NLP class there are programming assignments with special formatting, headers, etc. I kind of want to write a script that uses NLP to snag NLP's programming instruction pages (as well as example code, etc.) Seems like that would be fun to do.
1 comments

But in that case wouldn't you be looking to get the essence, the plain text useful stuff of an HTML document, in which case wouldn't parsing using regular expressionism or something be better than NLP? I haven't really done scraping and parsing of documents/text so I'm not too sure.
It's possible yeah, though I like the formatting and highlighting and borders etc, it groups the different sections of the instructions together.

I see what you mean though, it's not really full NLP either way, I just used that term in place of regular expressions because it was in the NLP class that I learned about them (first homework is a phone and email scraper.) Probably my fault for using semantics wrong.