How to scrape realtor.com and which web scraping API to use

Realtor is the second biggest real estate listing website in the United States based in California with over 100 million monthly users. This makes it one of the most popular real estate targets for web scraping.

Realtor.com is using Kasada anti-bot protection together with proprietary anti-scraping technology to block web scraping. This makes it difficult to scrape Realtor property data reliably and this is where web scraping APIs come in handy.

Overall, only few of web scraping APIs we've tested through our benchmarks perform well for Realtor.com at $2.45 per 1,000 scrape requests on average.

Realtor.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Realtor Listings for the most popular web scraping APIs. Here's the table for this week:

	Service	Success %	Speed	Cost $/1000
1	Scrapfly	100% +2	12.4s +1.4	$4.04 -0.07
2	WebScrapingAPI	94% -2	19.1s +1.5	$2.71 =
3	Scrapingbee	41% -4	15.0s +1.5	$3.27 =
4	Scraperapi	32% -1	5.9s -1.1	$4.9 =
5	Scrapingdog	31% -2	8.2s -0.2	$2.2 =
6	Zenrows	0%	-	-
7	Scrapingant	0%	-	-

Data range Jul 05 - Jul 11

How to scrape realtor.com?

Realtor is one of the easiest targets to scrape as it's a highly efficient javascript application that stores all of its data in JSON format which means headless browser use is not required.

That being said, Realtor.com has a lot of anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Realtor's HTML datasets contain their data in JSON variables under NextJS framework variables like __NEXT_DATA__ and can be easily extracted for full listing datasets making it an easy scraping target overall.

Realtor.com scraper

  import json
  from parsel import Selector
  # install using `pip install scrapfly-sdk`
    from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
  # create an API client instance
    client = ScrapflyClient(key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
    url=url,
    asp=True,
    country='US',
    
    ))
    return api_result.selector
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

  import json
  from parsel import Selector
  # webscrapingapi has a Python SDK but it's not great, use httpx instead:
    # `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "url": url,
    "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
    "timeout": 60_000,
    "render_js": "0",
    },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

  import json
  from parsel import Selector
  # install using `pip install scrapingbee`
    from scrapingbee import ScrapingBeeClient
  # create an API client instance
    client = ScrapingBeeClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": True,
    "transparent_status_code": True,
    "premium_proxy": "True",
    "render_js": "False",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

  import json
  from parsel import Selector
  # install using `pip install scraperapi`
    from scraper_api import ScraperAPIClient
  # create an API client instance
    client = ScraperAPIClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url=url,
    headers=headers or {},
    premium=True,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

  import json
  from parsel import Selector
  # scrapingdog has no integration but we can use httpx
    # install using `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
    "api_key": "YOUR API KEY",
    "url": url,
      "premium": "true",
      
    }
    api_result = client.post(
    "https://api.scrapingdog.com/scrape",
    json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

  import json
  from parsel import Selector
  # install using `pip install zenrows`
    from zenrows import ZenRowsClient
  # create an API client instance
    client = ZenRowsClient(apikey="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": "True",
    "premium_proxy": "True",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

  import json
  from parsel import Selector
  # install using `pip install scrapingant-client`
    from scrapingant_client import ScrapingAntClient
  # create an API client instance
    client = ScrapingAntClient(token="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
    url,
    browser=False,
    return_page_source=False,
    proxy_type='residential',
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
  selector = scrape(url)
  # The entire dataset can be found in a javascript variable:
  data = selector.css("script#__NEXT_DATA__::text").get()
  data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
  # The resulting dataset is pretty big but here are some example fields:
  from pprint import pprint
  pprint(data)
  {
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
  }

For scraping Realtor.com above we're retrieving the HTML and extract the entire page dataset from a hidden JSON variable. As realtor.com is using next.js this variable is available in the NEXT_DATA script.

Why scrape Realtor Listings?

Realtor is the second biggest real estate property listing website in the US so it has a large amount of real estate data from listing information to market trends and metadata.

With lead scraping Realtor can be used to generate leads for real estate agents, estate owners and investors.

As real estate is one of the biggest markets in the world Realtor is an invaluable Market research tool. It can be used to analyze market trends to minute details like specific neighborhoods and property types.

Realtor.com is also often scraped by real estate agents and investors to monitor competition. and adjust their product and pricing strategies.

Web Scraping Realtor.com Overview

Realtor.com scraping API benchmarks

How to scrape realtor.com?

Join the Scrapeway newsletter!

Why scrape Realtor Listings?