|
|
|
|
|
by wraptile
1418 days ago
|
|
It's perfectly legal! That being said, you still need to be resource-polite. A lot of people scrape Zillow through browser automation toolkits like Selenium, Puppeteer etc. because it's a JS heavy website and these tools are really bandwidth intensive. This could, in theory, get you in trouble for DDOS. Instead, since Zillow is using Next.js for their backend so, you can actually retrieve the dataset for any page just by parsing the nextjs cache. This can be done by selecting data in the <script id="__NEXT_DATA__"> node which requires minimal resources from both sides. e.g. in python: import json
import httpx
from parsel import Selector
response = httpx.get("https://www.zillow.com/b/1625-e-13th-st-brooklyn-ny-5YGKWY/")
script_data = Selector(text=response.text).css('#__NEXT_DATA__').get()
script_data = json.loads(script_data)
# all of the property data is here, for example building details:
print(script_data['building'])
I wrote a tutorial on this if you'd like to learn more: https://scrapfly.io/blog/how-to-scrape-zillow/#scraping-prop... |
|