How to scrape indeed.com and which web scraping API to use

Indeed is one of the biggest job listing and recruitment portals in the world.

Indeed.com is using proprietary web scraping protection tech that is being constantly updated together with Cloudflare anti-bot service. This makes it difficult to scrape Indeed data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for scraping Indeed.com at $2.8 per 1,000 scrape requests on average.

Indeed.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Indeed Jobs for the most popular web scraping APIs. Here's the table for this week:

	Service	Success %	Speed	Cost $/1000
1	Scrapfly	98% -2	26.7s -0.1	$12.0 =
2	WebScrapingAPI	94% -2	21.7s +0.9	$2.71 =
3	Scraperapi	51% =	9.9s +3.6	$4.9 =
4	Scrapingbee	0%	-	-
5	Scrapingdog	0%	-	-
6	Zenrows	0%	-	-
7	Scrapingant	0%	-	-

Data range Jun 28 - Jul 04

How to scrape indeed.com?

Indeed.com is relatively easy to scrape as it's mostly static content with very few dynamic elements so headless browser use is not required.

That being said, Indeed.com has several anti-scraping technologies in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Indeed's HTML pages are well structured and minimal so it can be easily parsed using traditional HTML parsing tools like XPath or CSS selectors. Though, that's often unnecessary as the entire of Indeed's page dataset is available in JSON variables like _initialData.

Indeed.com scraper

  import json
  from parsel import Selector
  # install using `pip install scrapfly-sdk`
    from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
  # create an API client instance
    client = ScrapflyClient(key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
    url=url,
    asp=True,
    country='US',
    
    ))
    return api_result.selector
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

  import json
  from parsel import Selector
  # webscrapingapi has a Python SDK but it's not great, use httpx instead:
    # `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "url": url,
    "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
    "timeout": 60_000,
    "render_js": "1",
    },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

import json from parsel import Selector # webscrapingapi has a Python SDK but it's not great, use httpx instead: # `pip install httpx` import httpx # create an API client instance client = httpx.Client(timeout=180) # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url, headers=headers, params={ "url": url, "api_key": "YOUR API KEY", # NOTE: add your API KEY here! "timeout": 60_000, "render_js": "1", }, ) assert api_result.status_code == 200, api_result.reason_phrase return Selector(api_result.text) # example search page url: url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA" selector = scrape(url, country="US") # Indeed jobs can be found in Javascript variable as an array of job objects: data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});') data = json.loads(data[0]) jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"] print(len(jobs)) 15 from pprint import pprint pprint(jobs[0]) { 'applyCount': 0, 'company': 'Pythonwise', 'companyRating': 0, 'companyReviewCount': 0, 'createDate': 1568635928000, 'jobLocationCity': 'Seattle', 'jobLocationState': 'WA', 'normTitle': 'Python developer', 'organicApplyStartCount': 1493, 'pubDate': 1568610000000, 'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0}, 'salarySnippet': {'currency': '', 'salaryTextFormatted': False}, 'snippet': '

We are looking for a Python Web Developer responsible ' 'for developing, enhancing, modifying, maintaining applications ' 'and managing the interchange of data…

', 'sourceId': 14854320, 'sponsored': False, 'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA', # ... and much more }

  import json
  from parsel import Selector
  # install using `pip install scraperapi`
    from scraper_api import ScraperAPIClient
  # create an API client instance
    client = ScraperAPIClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url=url,
    headers=headers or {},
    premium=True,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

  import json
  from parsel import Selector
  # install using `pip install scrapingbee`
    from scrapingbee import ScrapingBeeClient
  # create an API client instance
    client = ScrapingBeeClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": True,
    "transparent_status_code": True,
    "premium_proxy": "True",
    "render_js": "False",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

import json from parsel import Selector # install using `pip install scrapingbee` from scrapingbee import ScrapingBeeClient # create an API client instance client = ScrapingBeeClient(api_key="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url, headers=headers, params={ "json_response": True, "transparent_status_code": True, "premium_proxy": "True", "render_js": "False", } ) assert api_result.ok, api_result.text data = api_result.json() return Selector(data['body']) # example search page url: url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA" selector = scrape(url, country="US") # Indeed jobs can be found in Javascript variable as an array of job objects: data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});') data = json.loads(data[0]) jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"] print(len(jobs)) 15 from pprint import pprint pprint(jobs[0]) { 'applyCount': 0, 'company': 'Pythonwise', 'companyRating': 0, 'companyReviewCount': 0, 'createDate': 1568635928000, 'jobLocationCity': 'Seattle', 'jobLocationState': 'WA', 'normTitle': 'Python developer', 'organicApplyStartCount': 1493, 'pubDate': 1568610000000, 'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0}, 'salarySnippet': {'currency': '', 'salaryTextFormatted': False}, 'snippet': '

We are looking for a Python Web Developer responsible ' 'for developing, enhancing, modifying, maintaining applications ' 'and managing the interchange of data…

  import json
  from parsel import Selector
  # scrapingdog has no integration but we can use httpx
    # install using `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
    "api_key": "YOUR API KEY",
    "url": url,
      "premium": "true",
      
    }
    api_result = client.post(
    "https://api.scrapingdog.com/scrape",
    json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

import json from parsel import Selector # scrapingdog has no integration but we can use httpx # install using `pip install httpx` import httpx # create an API client instance client = httpx.Client(timeout=180) # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: payload = { "api_key": "YOUR API KEY", "url": url, "premium": "true", } api_result = client.post( "https://api.scrapingdog.com/scrape", json=payload, ) data = api_result.json() assert data['success'], f"scrape failed: {data['message']}" return Selector(data['html']) # example search page url: url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA" selector = scrape(url, country="US") # Indeed jobs can be found in Javascript variable as an array of job objects: data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});') data = json.loads(data[0]) jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"] print(len(jobs)) 15 from pprint import pprint pprint(jobs[0]) { 'applyCount': 0, 'company': 'Pythonwise', 'companyRating': 0, 'companyReviewCount': 0, 'createDate': 1568635928000, 'jobLocationCity': 'Seattle', 'jobLocationState': 'WA', 'normTitle': 'Python developer', 'organicApplyStartCount': 1493, 'pubDate': 1568610000000, 'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0}, 'salarySnippet': {'currency': '', 'salaryTextFormatted': False}, 'snippet': '

We are looking for a Python Web Developer responsible ' 'for developing, enhancing, modifying, maintaining applications ' 'and managing the interchange of data…

  import json
  from parsel import Selector
  # install using `pip install zenrows`
    from zenrows import ZenRowsClient
  # create an API client instance
    client = ZenRowsClient(apikey="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": "True",
    "premium_proxy": "True",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

import json from parsel import Selector # install using `pip install zenrows` from zenrows import ZenRowsClient # create an API client instance client = ZenRowsClient(apikey="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url, headers=headers, params={ "json_response": "True", "premium_proxy": "True", } ) assert api_result.ok, api_result.text data = api_result.json() return Selector(data['html']) # example search page url: url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA" selector = scrape(url, country="US") # Indeed jobs can be found in Javascript variable as an array of job objects: data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});') data = json.loads(data[0]) jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"] print(len(jobs)) 15 from pprint import pprint pprint(jobs[0]) { 'applyCount': 0, 'company': 'Pythonwise', 'companyRating': 0, 'companyReviewCount': 0, 'createDate': 1568635928000, 'jobLocationCity': 'Seattle', 'jobLocationState': 'WA', 'normTitle': 'Python developer', 'organicApplyStartCount': 1493, 'pubDate': 1568610000000, 'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0}, 'salarySnippet': {'currency': '', 'salaryTextFormatted': False}, 'snippet': '

We are looking for a Python Web Developer responsible ' 'for developing, enhancing, modifying, maintaining applications ' 'and managing the interchange of data…

  import json
  from parsel import Selector
  # install using `pip install scrapingant-client`
    from scrapingant_client import ScrapingAntClient
  # create an API client instance
    client = ScrapingAntClient(token="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
    url,
    browser=False,
    return_page_source=False,
    proxy_type='residential',
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  # example search page url:
  url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
  selector = scrape(url, country="US")
  # Indeed jobs can be found in Javascript variable as an array of job objects:
  data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
  data = json.loads(data[0])
  jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
  print(len(jobs))
  15
  from pprint import pprint
  pprint(jobs[0])
  {
  'applyCount': 0,
  'company': 'Pythonwise',
  'companyRating': 0,
  'companyReviewCount': 0,
  'createDate': 1568635928000,
  'jobLocationCity': 'Seattle',
  'jobLocationState': 'WA',
  'normTitle': 'Python developer',
  'organicApplyStartCount': 1493,
  'pubDate': 1568610000000,
  'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
  'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
  'snippet': '
  <ul style="list-style-type:circle;
             margin-top: 0px;
             margin-bottom: ' '0px;
             padding-left:20px">
    \n'
    '
    <li>
      We are looking for a <b>Python</b> Web Developer responsible '
      'for developing, enhancing, modifying, maintaining applications '
      'and managing the interchange of data…
    </li>
    \n'
    '
  </ul>
  ',
  'sourceId': 14854320,
  'sponsored': False,
  'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
  # ... and much more
  }

import json from parsel import Selector # install using `pip install scrapingant-client` from scrapingant_client import ScrapingAntClient # create an API client instance client = ScrapingAntClient(token="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.general_request( url, browser=False, return_page_source=False, proxy_type='residential', ) assert api_result.ok, api_result.text return Selector(api_result.text) # example search page url: url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA" selector = scrape(url, country="US") # Indeed jobs can be found in Javascript variable as an array of job objects: data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});') data = json.loads(data[0]) jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"] print(len(jobs)) 15 from pprint import pprint pprint(jobs[0]) { 'applyCount': 0, 'company': 'Pythonwise', 'companyRating': 0, 'companyReviewCount': 0, 'createDate': 1568635928000, 'jobLocationCity': 'Seattle', 'jobLocationState': 'WA', 'normTitle': 'Python developer', 'organicApplyStartCount': 1493, 'pubDate': 1568610000000, 'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0}, 'salarySnippet': {'currency': '', 'salaryTextFormatted': False}, 'snippet': '

We are looking for a Python Web Developer responsible ' 'for developing, enhancing, modifying, maintaining applications ' 'and managing the interchange of data…

For scraping indeed.com above we're extracting JSON data from the HTML body. For that we're using a regular expressions variable to select thi JSON javascript variable data. This section contains the entire job listing dataset.

Why scrape Indeed Jobs?

Web scraping indeed.com is a popular use case for job seekers, recruiters, and HR professionals.

With job monitoring scraping we can keep track job listings and how they change over time giving insights to market trends. By scraping Indeed.com job search we can also aggregate employment data of specific regions and mediums. e.g. scraping "Python Developers in San Francisco" we can keep track of Python opportunities in one particular area and how they change over time.

Indeed data scraping can also be used in Market research. It provides not only job listing details but comprehensive company profile pages that in combination can be used to create reliable market research graphs and reports.

Indeed.com is often scraped by recruiters who list their own job listings on the platform for competitive analysis as it can help to optimize job listings to the current market trends.

Finally, Indeed contains a lot of user-generated content like company reviews which can be used for sentiment analysis and reputation management as well as AI training.

Web Scraping Indeed.com Overview

Indeed.com scraping API benchmarks

How to scrape indeed.com?

Join the Scrapeway newsletter!

Why scrape Indeed Jobs?