| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by minimaxir 3517 days ago
	Rvest works fine with tabular data. If, however, you are working with data outside of Wikipedia, you will find that website data is very rarely available in a <table> and is instead part of a hierarchical tree, which is a pain to process/clean in R. In such cases, working with Python/BeautifulSoup4 and importing the clean and normalized data into R will save frustration over time, even offsetting the overhead of using two languages.

2 comments

haddr 3517 days ago

I will work with any data, as soon as it is easily retrieved with some css selector. Otherwise you would have problems using any web scraping tool.

link

sixtypoundhound 3517 days ago

JSON is pretty easy to unpack, if you can figure out the call back that gets the data.

link

minimaxir 3517 days ago

The primary use case for web scraping tools like Rvest is for data that doesn't have a JSON endpoint and everything is rendered serverside, or is a static web page.

link

baldfat 3516 days ago

> In such cases, working with Python/BeautifulSoup4

BUT Rvest is a BeautifulSoup inspired library and works pretty much the same way?

link