Realtor is the second biggest real estate listing website in the United States based in California with over 100 million monthly users. This makes it one of the most popular real estate targets for web scraping.
Realtor.com is using Kasada anti-bot protection together with proprietary anti-scraping technology to block web scraping. This makes it difficult to scrape Realtor property data reliably and this is where web scraping APIs come in handy.
Overall, only few of web scraping APIs we've tested through our benchmarks perform well for Realtor.com at $2.23 per 1,000 scrape requests on average.
Realtor.com scraping API benchmarks
Scrapeway runs weekly benchmarks for Realtor Listings for the most popular web scraping APIs. Here's the table for this week:
Service | Success % | Speed | Cost $/1000 | |
---|---|---|---|---|
1
|
100%
+1
|
12.8s
-2.9
|
$2.51
-0.62
|
|
2
|
99%
+46
|
7.5s
-2.1
|
$4.9
=
|
|
3
|
99%
-1
|
7.6s
+1.0
|
$2.2
=
|
|
4
|
79%
+2
|
34.5s
+2.4
|
$2.71
=
|
|
5
|
23%
-38
|
3.1s
+0.1
|
$3.27
=
|
|
6
|
0%
|
-
|
-
|
|
7
|
0%
|
-
|
-
|
How to scrape realtor.com?
Realtor is one of the easiest targets to scrape as it's a highly efficient javascript application that stores all of its data in JSON format which means headless browser use is not required.
That being said, Realtor.com has a lot of anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.
Realtor's HTML datasets contain their data in JSON variables under NextJS framework variables like
__NEXT_DATA__
and can be easily extracted for full listing datasets making it
an easy scraping target overall.
import json
from parsel import Selector
# webscrapingapi has a Python SDK but it's not great, use httpx instead:
# `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"url": url,
"api_key": "YOUR API KEY", # NOTE: add your API KEY here!
"timeout": 60_000,
"render_js": "0",
},
)
assert api_result.status_code == 200, api_result.reason_phrase
return Selector(api_result.text)
url = "https://www.realtor.com/realestateandhomes-detail/16-Sea-Cliff-Ave_San-Francisco_CA_94121_M21813-49460"
selector = scrape(url)
# The entire dataset can be found in a javascript variable:
data = selector.css("script#__NEXT_DATA__::text").get()
data = json.loads(data)["props"]["pageProps"]["initialReduxState"]
# The resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(data)
{
'property_id': '2181349460',
'status': 'for_sale',
'price_per_sqft': 2190,
'photo_count': 45,
'primary_photo': {'href': 'https://ap.rdcpix.com/48bce403ce912cb3e41bf38df9526a4al-b3276956225s.jpg'}
# and much more
}
For scraping Realtor.com above we're retrieving the HTML and extract the entire page dataset from a hidden JSON variable. As realtor.com is using next.js this variable is available in the NEXT_DATA script.
Why scrape Realtor Listings?
Realtor is the second biggest real estate property listing website in the US so it has a large amount of real estate data from listing information to market trends and metadata.
With lead scraping Realtor can be used to generate leads for real estate agents, estate owners and investors.
As real estate is one of the biggest markets in the world Realtor is an invaluable Market research tool. It can be used to analyze market trends to minute details like specific neighborhoods and property types.
Realtor.com is also often scraped by real estate agents and investors to monitor competition. and adjust their product and pricing strategies.