Web Scraping Indeed.com Overview

2024-04-08

Indeed is one of the biggest job listing and recruitment portals in the world.

Indeed.com is using proprietary web scraping protection tech that is being constantly updated together with Cloudflare anti-bot service. This makes it difficult to scrape Indeed data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for scraping Indeed.com at $3.51 per 1,000 scrape requests on average.

Indeed.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Indeed Jobs for the most popular web scraping APIs. Here's the table for this week:

Service Success % Speed Cost $/1000
1
100%
+1
7.1s
+0.4
$2.2
=
2
99%
-1
6.3s
+3.4
$3.86
+0.02
3
84%
-9
16.6s
+0.3
$2.71
=
4
79%
+7
4.2s
=
$2.76
=
5
59%
-21
6.0s
-1.9
$9.8
=
6
4%
=
2.1s
+0.8
$3.27
=
7
0%
-
-
Data range Nov 01 - Nov 08

How to scrape indeed.com?

Indeed.com is relatively easy to scrape as it's mostly static content with very few dynamic elements so headless browser use is not required.

That being said, Indeed.com has several anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Indeed's HTML pages are well structured and minimal so it can be easily parsed using traditional HTML parsing tools like XPath or CSS selectors. Though, that's often unnecessary as the entire of Indeed's page dataset is available in JSON variables like _initialData.

Indeed.com scraper
import json
from parsel import Selector
# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx

# create an API client instance
client = httpx.Client(timeout=180)

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
        "api_key": "YOUR API KEY",
        "url": url,
        "premium": "true",
        
    }
    api_result = client.post(
        "https://api.scrapingdog.com/scrape",
        json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])


# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]

print(len(jobs))
15

from pprint import pprint
pprint(jobs[0])
{
 'applyCount': 0,
 'company': 'Pythonwise',
 'companyRating': 0,
 'companyReviewCount': 0,
 'createDate': 1568635928000,
 'jobLocationCity': 'Seattle',
 'jobLocationState': 'WA',
 'normTitle': 'Python developer',
 'organicApplyStartCount': 1493,
 'pubDate': 1568610000000,
 'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
 'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
 'snippet': '<ul style="list-style-type:circle;margin-top: 0px;margin-bottom: '
            '0px;padding-left:20px;"> \n'
            ' <li>We are looking for a <b>Python</b> Web Developer responsible '
            'for developing, enhancing, modifying, maintaining applications '
            'and managing the interchange of data…</li>\n'
            '</ul>',
 'sourceId': 14854320,
 'sponsored': False,
 'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
}

For scraping indeed.com above we're extracting JSON data from the HTML body. For that we're using a regular expressions variable to select thi JSON javascript variable data. This section contains the entire job listing dataset.

Join the Scrapeway newsletter!

Early benchmark reports and industry insights every week!

Why scrape Indeed Jobs?

Web scraping indeed.com is a popular use case for job seekers, recruiters, and HR professionals.

With job monitoring scraping we can keep track job listings and how they change over time giving insights to market trends. By scraping Indeed.com job search we can also aggregate employment data of specific regions and mediums. e.g. scraping "Python Developers in San Francisco" we can keep track of Python opportunities in one particular area and how they change over time.

Indeed data scraping can also be used in Market research. It provides not only job listing details but comprehensive company profile pages that in combination can be used to create reliable market research graphs and reports.

Indeed.com is often scraped by recruiters who list their own job listings on the platform for competitive analysis as it can help to optimize job listings to the current market trends.

Finally, Indeed contains a lot of user-generated content like company reviews which can be used for sentiment analysis and reputation management as well as AI training.