Hacker News new | ask | show | jobs
by raquo 6415 days ago
If you are interested only in new posts, you can look in blogs' RSS feeds. They are nearly always in default locations.

Or you could parse the URL - I had a similar task some time ago, and I went with URLs - Blogger and Typepad are consistent; WordPress depends on the blog, of course, but you could figure out several most popular patterns (e. g. /yyyy/mm/dd/posttitle, /id-posttitle) and get like 90% of all blogs right.

Or maybe, just maybe, you could use some third parties that have already figured it out via RSS - maybe Technorati?

1 comments

Yea, I think thats the route i'll end up going, I've already started developing a pattern system, and overnight I thought of a few ways that might make that easier to get the title and date of a page.

There's only one thing I'm still stumped on and thats simply how do you tell when your on the original article page and not the index/tag/search/ that still sometimes contains the same content as the article page.