How to scrape zillow.com and which web scraping API to use

Zillow is one of the biggest real estate listing websites in the United States which contains vast amount of real estate current and historical data. This makes it the most popular real estate target for web scraping.

Zillow.com is using its own proprietary web scraping protection technology in combination with PerimeterX anti-bot service. This makes it difficult to scrape Zillow property data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for Zillow.com at $1.26 per 1,000 scrape requests on average.

Zillow.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Zillow Listings for the most popular web scraping APIs. Here's the table for this week:

	Service	Success %	Speed	Cost $/1000
1	Scrapfly	99% -1	5.2s +0.5	$3.32 -0.36
2	Scraperapi	71% -11	25.7s +6.3	$4.9 =
3	WebScrapingAPI	34% -45	5.5s +3.4	$0.32 -2.39
4	Zenrows	22% +16	2.1s -0.1	$0.28 =
5	Scrapingbee	0%	-	-
6	Scrapingant	0%	-	-
7	Scrapingdog	0%	-	-

Data range Nov 22 - Nov 28

How to scrape zillow.com?

Zillow is one of the easiest targets to scrape as it's a highly efficient javascript application that stores all of its data in JSON format which means headless browser use is not required.

That being said, Zillow.com has a lot of anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Zillow's HTML datasets contain their data in JSON variables under NextJS framework variables like __NEXT_DATA__ and can be easily extracted for full listing datasets making it an easy scraping target overall.

Zillow.com scraper

  import json
  from parsel import Selector
  # install using `pip install scrapfly-sdk`
    from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
  # create an API client instance
    client = ScrapflyClient(key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
    url=url,
    asp=True,
    render_js=False,
    cache=False,
    cache_ttl=900,
    url='https://www.zillow.com/homedetails/553-Quincy-St-FLOOR-1-Brooklyn-NY-11221/458483194_zpid/',
    method='GET',
    
    ))
    return api_result.selector
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

  import json
  from parsel import Selector
  # install using `pip install scraperapi`
    from scraper_api import ScraperAPIClient
  # create an API client instance
    client = ScraperAPIClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url=url,
    headers=headers or {},
    render=False,
    url=https://www.zillow.com/homedetails/553-Quincy-St-FLOOR-1-Brooklyn-NY-11221/458483194_zpid/,
    method=GET,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

  import json
  from parsel import Selector
  # webscrapingapi has a Python SDK but it's not great, use httpx instead:
    # `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "url": url,
    "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
    "timeout": 60_000,
    },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

  import json
  from parsel import Selector
  # install using `pip install zenrows`
    from zenrows import ZenRowsClient
  # create an API client instance
    client = ZenRowsClient(apikey="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": "True",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

  import json
  from parsel import Selector
  # install using `pip install scrapingbee`
    from scrapingbee import ScrapingBeeClient
  # create an API client instance
    client = ScrapingBeeClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": True,
    "transparent_status_code": True,
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

  import json
  from parsel import Selector
  # install using `pip install scrapingant-client`
    from scrapingant_client import ScrapingAntClient
  # create an API client instance
    client = ScrapingAntClient(token="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
    url,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

  import json
  from parsel import Selector
  # scrapingdog has no integration but we can use httpx
    # install using `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
    "api_key": "YOUR API KEY",
    "url": url,
      
    }
    api_result = client.post(
    "https://api.scrapingdog.com/scrape",
    json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])
  url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
  property_data = list(json.loads(data).values())[0]['property']
  # the resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(person_data)
  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

For scraping Zillow.com we're retrieving the HTML and extract the property dataset from a hidden JSON variable. As Zillow.com is using next.js this variable is available in the NEXT_DATA script.

Why scrape Zillow Listings?

Zillow is a popular web scraping as it has a large amount of real estate data from listing information to market trends and metadata.

With lead scraping Zillow can be used to generate leads for real estate agents, estate owners and investors.

As real estate is one of the biggest markets in the world Zillow is an invaluable Market research tool. It can be used to analyze market trends to minute details like specific neighborhoods and property types.

Zillow.com is also often scraped by real estate agents and investors to monitor competition. and adjust their product and pricing strategies.

Web Scraping Zillow.com Overview

Zillow.com scraping API benchmarks

How to scrape zillow.com?

Join the Scrapeway newsletter!

Why scrape Zillow Listings?