| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lawl 3156 days ago

I tried a random reddit thread. Did not fetch comments, only information about the submission. Then I tried it with HN. Same. Then I tried it with a github issue. Same.

Then I tried it with the first link I got from news.google.com which was nytimes. No article text included.

Maybe I'm misunderstanding the purpose of this? Or was that just a string of bad luck?

3 comments

taternuts 3156 days ago

Something I was doing recently required an API with sports scores and I found out how astronomically expensive sports API's are, so I gave this a shot on an ESPN page with game statistics (http://www.espn.com/nba/game?gameId=400974869) and the results were basically the amount of info you'd get from a facebook link preview

link

janober 3156 days ago

No is not just bad luck. Currently did not concentrate so much on "text pages" like blogs or articles yet. Mainly on pages which contain more data like prices, geo coordinates, social media profiles, ... That said support for the mentioned pages can simply be added via our point and click GUI by any user. Do sadly not have time right now, but can add support for this pages by tomorrow.

link

lawl 3156 days ago

Ah thanks. Currently I have already written scrapers for the stuff I need. I was mainly just curious. I'll bookmark it to look at again when I need something next time.

link

weego 3156 days ago

So your paid for product for scraping and structuring information from a webpage cannot actually return most content off a webpage as it is now? Wouldn't that be a more important vertical slice of a product for an MVP than having a fully thought out pay tier system?

link

nedwin 3156 days ago

Maybe? It's all about tradeoffs.

I could see why you would want to figure out how much value the MVP is creating, and $$$ is an honest way to do that.

It sounds like two things are happening with the MVP: - emphasis on more complicated sites with more data (higher propensity to pay) - this functionality is actually possible but a user needs to take the time to set it up via a the GUI.

Feels like a pretty good tradeoff to me.

link

janober 3156 days ago

Thanks, seems like you got it ;-)

link

Treegarden 3156 days ago

Same here. Really curious though if there is any service/api to harvest news articles from websites to experiment with text analysis with? Havent found any after some search, just apis that provide meta information but not an actual corpus of text.

link