The service would take off much more if instead of defining search patterns as regular expressions they were defined as jquery style expressions that acknowledged DOM and allow you to find all <title> tags that exist in the <header>. Yes you can do this with regexp, but parsing HTML shouldn't be a regexp task.
Oh, I'd like to see email gateways too... point a stream of emails at it and parse those. I'm thinking of scenarios like tripit.com taking in tons of different emails and parsing them to extract travel info.
I'm building something right now that includes page parsing, and so far I've only been building in regex support. I like your jQuery selector idea as well, are there any other ways that you can think of that would make searching the contents of a page programmatically easier for you?
May I suggest taking a look at Parsely? Its the syntax they use on www.parselets.com. The documentation for implementing it in your own apps is a little sparse, but the data format is awesome. Here's one that describes scraping HN:
The service would take off much more if instead of defining search patterns as regular expressions they were defined as jquery style expressions that acknowledged DOM and allow you to find all <title> tags that exist in the <header>. Yes you can do this with regexp, but parsing HTML shouldn't be a regexp task.
Oh, I'd like to see email gateways too... point a stream of emails at it and parse those. I'm thinking of scenarios like tripit.com taking in tons of different emails and parsing them to extract travel info.