Web scraping zillow.com

Last updated: 2024-04-08

Zillow is one of the biggest real estate listing websites in the United States which contains vast amount of real estate current and historical data. This makes it the most popular real estate target for web scraping.

Zillow.com is using its own proprietary web scraping protection technology in combination with PerimeterX anti-bot service. This makes it difficult to scrape Zillow property data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for Zillow.com at $2.18 per 1,000 scrape requests on average.

Zillow.com scraping API benchmarks

Scrapeway runs bi-weekly benchmarks for Zillow Listings against the most popular web scraping APIs. Here's the ranking for this period:

Web scraping API benchmark for zillow.com — success rate, speed, cost per 1,000 requests. Data: 2026-05-02 to 2026-05-08.
#	Service	Success	Speed	Cost/1k
1 🥇	Firecrawl	100% +1	4.9s -0.8	$6.33 -4.3	—
2 🥈	Scrapfly	99% =	4.4s +0.8	$2.66 +0.1	(237) ★ 4.9
3 🥉	WebScrapingAPI	98% -1	18.1s +2.7	$2.71 =	—
4	Scrapingdog	95% +38	3.6s -1.4	$5.0 =	—
5	Scraperapi	72% +3	17.6s +5.8	$0.49 =	(62) ★ 4.6
6	Zenrows	34% +28	2.6s -0.5	$0.28 =	(103) ★ 4.8
7	Scrapingbee	0% —	— —	— —	(137) ★ 4.9
8	Scrapingant	0% —	— —	— —	—

Data range May 02 – May 08

All Benchmarks →

How to scrape zillow.com?

Zillow is one of the easiest targets to scrape as it's a highly efficient javascript application that stores all of its data in JSON format which means headless browser use is not required.

That being said, Zillow.com has a lot of anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Zillow's HTML datasets contain their data in JSON variables under NextJS framework variables like __NEXT_DATA__ and can be easily extracted for full listing datasets making it an easy scraping target overall.

Code example

zillow_scraper.py

import json
from parsel import Selector

# create an API client instance
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
    selector = scrape(url)
    # The entire dataset can be found in a javascript variable:
    data = selector.css("script#__NEXT_DATA__::text").get()
    data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
    property_data = list(json.loads(data).values())[0]['property']
    # the resulting dataset is pretty big but here are some example fields:
    from pprint import pprint
    pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# install using `pip install scrapfly-sdk`
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse

# create an API client instance
client = ScrapflyClient(key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
            url=url,
            asp=True,
            render_js=False,
            cache=False,
            cache_ttl=900,
            url='https://www.zillow.com/homedetails/134-Holiday-Dr-Martinez-GA-30907/14215648_zpid/',
            method='GET',

    ))
    return api_result.selector

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# webscrapingapi has a Python SDK but it's not great, use httpx instead:
# `pip install httpx`
import httpx

# create an API client instance
client = httpx.Client(timeout=180)

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url,
        headers=headers,
        params={
            "url": url,
            "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
            "timeout": 60_000,
            "render_js": "False",
            "url": "https://www.zillow.com/homedetails/134-Holiday-Dr-Martinez-GA-30907/14215648_zpid/",
            "method": "GET",
        },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx

# create an API client instance
client = httpx.Client(timeout=180)

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
        "api_key": "YOUR API KEY",
        "url": url,
        "dynamic": "True",
        "api_url": "https://api.scrapingdog.com/scrape",
        "premium": "True",
        "url": "https://www.zillow.com/homedetails/134-Holiday-Dr-Martinez-GA-30907/14215648_zpid/",
        "method": "GET",

    }
    api_result = client.post(
        "https://api.scrapingdog.com/scrape",
        json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# install using `pip install scraperapi`
from scraper_api import ScraperAPIClient

# create an API client instance
client = ScraperAPIClient(api_key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url=url,
        headers=headers or {},
        render=False,
        url=https://www.zillow.com/homedetails/134-Holiday-Dr-Martinez-GA-30907/14215648_zpid/,
        method=GET,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# install using `pip install zenrows`
from zenrows import ZenRowsClient

# create an API client instance
client = ZenRowsClient(apikey="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url,
        headers=headers,
        params={
            "json_response": "True",
        }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# install using `pip install scrapingbee`
from scrapingbee import ScrapingBeeClient

# create an API client instance
client = ScrapingBeeClient(api_key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url,
        headers=headers,
        params={
            "json_response": True,
            "transparent_status_code": True,
        }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

import json
from parsel import Selector

# install using `pip install scrapingant-client`
from scrapingant_client import ScrapingAntClient

# create an API client instance
client = ScrapingAntClient(token="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
        url,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)

url = "https://www.zillow.com/homedetails/1414-1416-20th-Ave-San-Francisco-CA-94122/332857311_zpid/"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["componentProps"]["gdpClientCache"]
property_data = list(json.loads(data).values())[0]['property']

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(person_data)

Output $ python zillow_scraper.py

  {
  "listingDataSource": "Phoenix",
  "zpid": 332857311,
  "city": "San Francisco",
  "state": "CA",
  "homeStatus": "FOR_SALE",
  "address": {
  "streetAddress": "1414-1416 20th Ave",
  "city": "San Francisco",
  "state": "CA",
  "zipcode": "94122",
  "neighborhood": null,
  "community": null,
  "subdivision": null
  },
  "bedrooms": 7,
  "bathrooms": 3,
  "price": 1695000,
  "yearBuilt": 1924,
  "streetAddress": "1414-1416 20th Ave",
  "zipcode": "94122",
  # ...
  # and much more
  # ...
  }

For scraping Zillow.com we're retrieving the HTML and extract the property dataset from a hidden JSON variable. As Zillow.com is using next.js this variable is available in the NEXT_DATA script.

Why scrape Zillow Listings?

Zillow is a popular web scraping as it has a large amount of real estate data from listing information to market trends and metadata. With lead scraping Zillow can be used to generate leads for real estate agents, estate owners and investors.

As real estate is one of the biggest markets in the world Zillow is an invaluable Market research tool. It can be used to analyze market trends to minute details like specific neighborhoods and property types.

Zillow.com is also often scraped by real estate agents and investors to monitor competition. and adjust their product and pricing strategies.