Web Scraping Realtor.com Overview

2024-04-08

Realtor is the second biggest real estate listing website in the United States based in California with over 100 million monthly users. This makes it one of the most popular real estate targets for web scraping.

Realtor.com is using Kasada anti-bot protection together with proprietary anti-scraping technology to block web scraping. This makes it difficult to scrape Realtor property data reliably and this is where web scraping APIs come in handy.

Overall, only few of web scraping APIs we've tested through our benchmarks perform well for Realtor.com at $2.37 per 1,000 scrape requests on average.

Realtor.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Realtor Listings for the most popular web scraping APIs. Here's the table for this week:

Service Success % Speed Cost $/1000
1
100%
+15
6.2s
-1.1
$2.2
=
2
99%
-1
18.0s
+1.5
$3.51
+0.2
3
89%
-8
7.3s
-0.3
$4.9
=
4
74%
-6
34.5s
-1.7
$2.71
=
5
22%
-3
3.6s
=
$3.27
=
6
0%
-
-
7
0%
-
-
Data range Oct 18 - Oct 25

How to scrape realtor.com?

Realtor is one of the easiest targets to scrape as it's a highly efficient javascript application that stores all of its data in JSON format which means headless browser use is not required.

That being said, Realtor.com has a lot of anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Realtor's HTML datasets contain their data in JSON variables under NextJS framework variables like __NEXT_DATA__ and can be easily extracted for full listing datasets making it an easy scraping target overall.

Realtor.com scraper
import json
from parsel import Selector
# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx

# create an API client instance
client = httpx.Client(timeout=180)

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
        "api_key": "YOUR API KEY",
        "url": url,
        "premium": "true",
        
    }
    api_result = client.post(
        "https://api.scrapingdog.com/scrape",
        json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])


url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
selector = scrape(url)

# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["initialReduxState"]

# The resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(data)
{
  'property_id': '2181349460',
  'status': 'for_sale',
  'price_per_sqft': 2190,
  'photo_count': 45,
  'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
  # and much more
}

For scraping Realtor.com above we're retrieving the HTML and extract the entire page dataset from a hidden JSON variable. As realtor.com is using next.js this variable is available in the NEXT_DATA script.

Join the Scrapeway newsletter!

Early benchmark reports and industry insights every week!

Why scrape Realtor Listings?

Realtor is the second biggest real estate property listing website in the US so it has a large amount of real estate data from listing information to market trends and metadata.

With lead scraping Realtor can be used to generate leads for real estate agents, estate owners and investors.

As real estate is one of the biggest markets in the world Realtor is an invaluable Market research tool. It can be used to analyze market trends to minute details like specific neighborhoods and property types.

Realtor.com is also often scraped by real estate agents and investors to monitor competition. and adjust their product and pricing strategies.