Indeed is one of the biggest job listing and recruitment portals in the world.
Indeed.com is using proprietary web scraping protection tech that is being constantly updated together
with Cloudflare anti-bot service.
This makes it difficult to scrape Indeed data reliably and this is where web scraping APIs come in handy.
Overall, most web scraping APIs we've tested through our benchmarks
perform well for scraping Indeed.com at $6.52 per 1,000 scrape requests on average.
Indeed.com scraping API benchmarks
Scrapeway runs bi-weekly benchmarks for Indeed Jobs against the most popular web scraping APIs. Here's the ranking for this period:
Web scraping API benchmark for indeed.com β success rate, speed, cost per 1,000 requests. Data: 2026-05-02 to 2026-05-08.
Indeed.com is relatively easy to scrape as it's mostly static content with
very few dynamic elements so headless browser use is not required.
That being said, Indeed.com has several anti-scraping technologies in place, so it's recommended to use
a reliable web scraping service that can bypass the constantly changing anti-scraping measures.
See benchmarks for the most up-to-date results.
Indeed's HTML pages are well structured and minimal so it can be easily parsed using
traditional HTML parsing tools like XPath or CSS selectors. Though, that's often unnecessary
as the entire of Indeed's page dataset is available in JSON variables like _initialData.
Code example
indeed_scraper.py
importjsonfromparselimportSelector# install using `pip install scrapfly-sdk`fromscrapflyimportScrapflyClient,ScrapeConfig,ScrapeApiResponse# create an API client instanceclient=ScrapflyClient(key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.scrape(ScrapeConfig(url=url,asp=True,render_js=False,cache=False,cache_ttl=900,debug=True,url='https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0',method='GET',))returnapi_result.selector# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# install using `pip install scrapfly-sdk`
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
# create an API client instance
client = ScrapflyClient(key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.scrape(ScrapeConfig(
url=url,
asp=True,
render_js=False,
cache=False,
cache_ttl=900,
debug=True,
url='https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0',
method='GET',
))
return api_result.selector
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# install using `pip install scraperapi`fromscraper_apiimportScraperAPIClient# create an API client instanceclient=ScraperAPIClient(api_key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url=url,headers=headersor{},render=False,url=https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0,method=GET,)assertapi_result.ok,api_result.textreturnSelector(api_result.text)# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# install using `pip install scraperapi`
from scraper_api import ScraperAPIClient
# create an API client instance
client = ScraperAPIClient(api_key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url=url,
headers=headers or {},
render=False,
url=https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0,
method=GET,
)
assert api_result.ok, api_result.text
return Selector(api_result.text)
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# install using `pip install zenrows`fromzenrowsimportZenRowsClient# create an API client instanceclient=ZenRowsClient(apikey="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"json_response":"True","js_render":"True","premium_proxy":"True","url":"https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0","method":"GET",})assertapi_result.ok,api_result.textdata=api_result.json()returnSelector(data['html'])# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# install using `pip install zenrows`
from zenrows import ZenRowsClient
# create an API client instance
client = ZenRowsClient(apikey="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"json_response": "True",
"js_render": "True",
"premium_proxy": "True",
"url": "https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0",
"method": "GET",
}
)
assert api_result.ok, api_result.text
data = api_result.json()
return Selector(data['html'])
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# webscrapingapi has a Python SDK but it's not great, use httpx instead:# `pip install httpx`importhttpx# create an API client instanceclient=httpx.Client(timeout=180)# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"url":url,"api_key":"YOUR API KEY",# NOTE: add your API KEY here!"timeout":60_000,"render_js":"False","url":"https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0","method":"GET",},)assertapi_result.status_code==200,api_result.reason_phrasereturnSelector(api_result.text)# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# webscrapingapi has a Python SDK but it's not great, use httpx instead:
# `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"url": url,
"api_key": "YOUR API KEY", # NOTE: add your API KEY here!
"timeout": 60_000,
"render_js": "False",
"url": "https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0",
"method": "GET",
},
)
assert api_result.status_code == 200, api_result.reason_phrase
return Selector(api_result.text)
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# install using `pip install scrapingbee`fromscrapingbeeimportScrapingBeeClient# create an API client instanceclient=ScrapingBeeClient(api_key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"json_response":True,"transparent_status_code":True,})assertapi_result.ok,api_result.textdata=api_result.json()returnSelector(data['body'])# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# install using `pip install scrapingbee`
from scrapingbee import ScrapingBeeClient
# create an API client instance
client = ScrapingBeeClient(api_key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"json_response": True,
"transparent_status_code": True,
}
)
assert api_result.ok, api_result.text
data = api_result.json()
return Selector(data['body'])
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# install using `pip install scrapingant-client`fromscrapingant_clientimportScrapingAntClient# create an API client instanceclient=ScrapingAntClient(token="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.general_request(url,json=True,proxy_type='residential',url='https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0',method='GET',)assertapi_result.ok,api_result.textreturnSelector(api_result.text)# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# install using `pip install scrapingant-client`
from scrapingant_client import ScrapingAntClient
# create an API client instance
client = ScrapingAntClient(token="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.general_request(
url,
json=True,
proxy_type='residential',
url='https://www.indeed.com/viewjob?jk=89a7a5713f2d78c0',
method='GET',
)
assert api_result.ok, api_result.text
return Selector(api_result.text)
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# scrapingdog has no integration but we can use httpx# install using `pip install httpx`importhttpx# create an API client instanceclient=httpx.Client(timeout=180)# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:payload={"api_key":"YOUR API KEY","url":url,}api_result=client.post("https://api.scrapingdog.com/scrape",json=payload,)data=api_result.json()assertdata['success'],f"scrape failed: {data['message']}"returnSelector(data['html'])# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
payload = {
"api_key": "YOUR API KEY",
"url": url,
}
api_result = client.post(
"https://api.scrapingdog.com/scrape",
json=payload,
)
data = api_result.json()
assert data['success'], f"scrape failed: {data['message']}"
return Selector(data['html'])
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
importjsonfromparselimportSelector# create an API client instance# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:# example search page url:url="https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"selector=scrape(url,country="US")# Indeed jobs can be found in Javascript variable as an array of job objects:data=selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')data=json.loads(data[0])jobs=data["metaData"]["mosaicProviderJobCardsModel"]["results"]print(len(jobs))15frompprintimportpprintpprint(jobs[0])
Output$ python indeed_scraper.py
{
'applyCount': 0,
'company': 'Pythonwise',
'companyRating': 0,
'companyReviewCount': 0,
'createDate': 1568635928000,
'jobLocationCity': 'Seattle',
'jobLocationState': 'WA',
'normTitle': 'Python developer',
'organicApplyStartCount': 1493,
'pubDate': 1568610000000,
'rankingScoresModel': {'bid': 0, 'eApply': 0.015428938, 'eQualified': 0},
'salarySnippet': {'currency': '', 'salaryTextFormatted': False},
'snippet': '
<ul style="list-style-type:circle;
margin-top: 0px;
margin-bottom: ' '0px;
padding-left:20px">
\n'
'
<li>
We are looking for a <b>Python</b> Web Developer responsible '
'for developing, enhancing, modifying, maintaining applications '
'and managing the interchange of dataβ¦
</li>
\n'
'
</ul>
',
'sourceId': 14854320,
'sponsored': False,
'viewJobLink': '/viewjob?jk=1a20a1c56fb7df73&from=vjs&tk=1hr8aknntirln85c&viewtype=embedded&xkcb=SoCQ67M3CoddJxwBkx0LbzkdCdPP&continueUrl=%2Fjobs%3Fq%3Dpython%26l%3DSeattle%252C%2BWA',
# ... and much more
}
import json
from parsel import Selector
# create an API client instance
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
# example search page url:
url = "https://www.indeed.com/jobs?q=python&l=Seattle%2C%20WA"
selector = scrape(url, country="US")
# Indeed jobs can be found in Javascript variable as an array of job objects:
data = selector.re(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});')
data = json.loads(data[0])
jobs = data["metaData"]["mosaicProviderJobCardsModel"]["results"]
print(len(jobs))
15
from pprint import pprint
pprint(jobs[0])
For scraping indeed.com above we're extracting JSON data from the HTML body. For that we're using a
regular expressions variable to select thi JSON javascript variable data. This section contains the
entire job listing dataset.
Why scrape Indeed Jobs?
Web scraping indeed.com is a popular use case for job seekers, recruiters, and HR professionals.
With job monitoring scraping we can keep track job listings and how they change over time
giving insights to market trends. By scraping Indeed.com job search we can also aggregate employment
data of specific regions and mediums. e.g. scraping "Python Developers in San Francisco" we can keep track
of Python opportunities in one particular area and how they change over time.
Indeed data scraping can also be used in Market research. It provides not only job listing details
but comprehensive company profile pages that in combination can be used to create reliable
market research graphs and reports.
Indeed.com is often scraped by recruiters who list their own job listings on the platform for
competitive analysis as it can help to optimize job listings to the current market trends.
Finally, Indeed contains a lot of user-generated content like company reviews which can be used
for sentiment analysis and reputation management as well as AI training.