Hacker News new | ask | show | jobs
by lawl 3156 days ago
I tried a random reddit thread. Did not fetch comments, only information about the submission. Then I tried it with HN. Same. Then I tried it with a github issue. Same.

Then I tried it with the first link I got from news.google.com which was nytimes. No article text included.

Maybe I'm misunderstanding the purpose of this? Or was that just a string of bad luck?

3 comments

Something I was doing recently required an API with sports scores and I found out how astronomically expensive sports API's are, so I gave this a shot on an ESPN page with game statistics (http://www.espn.com/nba/game?gameId=400974869) and the results were basically the amount of info you'd get from a facebook link preview
No is not just bad luck. Currently did not concentrate so much on "text pages" like blogs or articles yet. Mainly on pages which contain more data like prices, geo coordinates, social media profiles, ... That said support for the mentioned pages can simply be added via our point and click GUI by any user. Do sadly not have time right now, but can add support for this pages by tomorrow.
Ah thanks. Currently I have already written scrapers for the stuff I need. I was mainly just curious. I'll bookmark it to look at again when I need something next time.
So your paid for product for scraping and structuring information from a webpage cannot actually return most content off a webpage as it is now? Wouldn't that be a more important vertical slice of a product for an MVP than having a fully thought out pay tier system?
Maybe? It's all about tradeoffs.

I could see why you would want to figure out how much value the MVP is creating, and $$$ is an honest way to do that.

It sounds like two things are happening with the MVP: - emphasis on more complicated sites with more data (higher propensity to pay) - this functionality is actually possible but a user needs to take the time to set it up via a the GUI.

Feels like a pretty good tradeoff to me.

Thanks, seems like you got it ;-)
Same here. Really curious though if there is any service/api to harvest news articles from websites to experiment with text analysis with? Havent found any after some search, just apis that provide meta information but not an actual corpus of text.