| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wraptile 1418 days ago

It's perfectly legal!

That being said, you still need to be resource-polite. A lot of people scrape Zillow through browser automation toolkits like Selenium, Puppeteer etc. because it's a JS heavy website and these tools are really bandwidth intensive. This could, in theory, get you in trouble for DDOS.

Instead, since Zillow is using Next.js for their backend so, you can actually retrieve the dataset for any page just by parsing the nextjs cache. This can be done by selecting data in the <script id="__NEXT_DATA__"> node which requires minimal resources from both sides. e.g. in python:

  import json
  import httpx
  from parsel import Selector
  
  response = httpx.get("https://www.zillow.com/b/1625-e-13th-st-brooklyn-ny-5YGKWY/")
  script_data = Selector(text=response.text).css('#__NEXT_DATA__').get()
  script_data = json.loads(script_data)
  # all of the property data is here, for example building details:
  print(script_data['building'])

I wrote a tutorial on this if you'd like to learn more: https://scrapfly.io/blog/how-to-scrape-zillow/#scraping-prop...

1 comments

skanga 1416 days ago

Outstanding tip. Thank you VERY much.

link