Instagram is one of the biggest social networks focusing on media sharing and public announcements making
it a popular target for web scraping.
Instagram.com is using proprietary web scraping protection tech that is updated constantly.
This makes it difficult to scrape Instagram at scale reliably and that's where web scraping APIs
can really come in handy.
Overall, most web scraping APIs we've tested through our benchmarks
perform well for scraping Instagram.com at $1.68 per 1,000 scrape requests on average.
Instagram.com scraping API benchmarks
Scrapeway runs weekly benchmarks for Instagram Pages for the most popular web scraping APIs.
Here's the table for this week:
Instagram.com can be surprisingly complex to scrape as it's a giant web app with
graphql backend. For people unfamiliar with reverse engineering using
browser network inspect it's probably best to pay extra and use
headless browser to fully render pages.
In this case, see web scraping API services that support full browser automation
that can click on specific posts and scroll through comments.
Here's a list of services that support full browser control:
That being said, Instagram can be scraped without the use of headless browsers
by using it's graphql backend like this python example for user profile scraping:
Instagram.com scraper
importjsonfromparselimportSelector# install using `pip install scrapfly-sdk`fromscrapflyimportScrapflyClient,ScrapeConfig,ScrapeApiResponse# create an API client instanceclient=ScrapflyClient(key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.scrape(ScrapeConfig(url=url,asp=True,country='US',))returnapi_result.selector# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# install using `pip install scrapfly-sdk`
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
# create an API client instance
client = ScrapflyClient(key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.scrape(ScrapeConfig(
url=url,
asp=True,
country='US',
))
return api_result.selector
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
importjsonfromparselimportSelector# install using `pip install scrapingant-client`fromscrapingant_clientimportScrapingAntClient# create an API client instanceclient=ScrapingAntClient(token="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.general_request(url,browser=False,return_page_source=False,proxy_type='residential',)assertapi_result.ok,api_result.textreturnSelector(api_result.text)# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# install using `pip install scrapingant-client`
from scrapingant_client import ScrapingAntClient
# create an API client instance
client = ScrapingAntClient(token="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.general_request(
url,
browser=False,
return_page_source=False,
proxy_type='residential',
)
assert api_result.ok, api_result.text
return Selector(api_result.text)
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
importjsonfromparselimportSelector# install using `pip install scrapingbee`fromscrapingbeeimportScrapingBeeClient# create an API client instanceclient=ScrapingBeeClient(api_key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"json_response":True,"transparent_status_code":True,"premium_proxy":"True","render_js":"False",})assertapi_result.ok,api_result.textdata=api_result.json()returnSelector(data['body'])# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# install using `pip install scrapingbee`
from scrapingbee import ScrapingBeeClient
# create an API client instance
client = ScrapingBeeClient(api_key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"json_response": True,
"transparent_status_code": True,
"premium_proxy": "True",
"render_js": "False",
}
)
assert api_result.ok, api_result.text
data = api_result.json()
return Selector(data['body'])
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
importjsonfromparselimportSelector# install using `pip install scraperapi`fromscraper_apiimportScraperAPIClient# create an API client instanceclient=ScraperAPIClient(api_key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url=url,headers=headersor{},premium=True,)assertapi_result.ok,api_result.textreturnSelector(api_result.text)# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# install using `pip install scraperapi`
from scraper_api import ScraperAPIClient
# create an API client instance
client = ScraperAPIClient(api_key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url=url,
headers=headers or {},
premium=True,
)
assert api_result.ok, api_result.text
return Selector(api_result.text)
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
importjsonfromparselimportSelector# install using `pip install zenrows`fromzenrowsimportZenRowsClient# create an API client instanceclient=ZenRowsClient(apikey="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"json_response":"True","premium_proxy":"True","js_render":"False",})assertapi_result.ok,api_result.textdata=api_result.json()returnSelector(data['html'])# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# install using `pip install zenrows`
from zenrows import ZenRowsClient
# create an API client instance
client = ZenRowsClient(apikey="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"json_response": "True",
"premium_proxy": "True",
"js_render": "False",
}
)
assert api_result.ok, api_result.text
data = api_result.json()
return Selector(data['html'])
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
importjsonfromparselimportSelector# scrapingdog has no integration but we can use httpx# install using `pip install httpx`importhttpx# create an API client instanceclient=httpx.Client(timeout=180)# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:payload={"api_key":"YOUR API KEY","url":url,"premium":"true",}api_result=client.post("https://api.scrapingdog.com/scrape",json=payload,)data=api_result.json()assertdata['success'],f"scrape failed: {data['message']}"returnSelector(data['html'])# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
payload = {
"api_key": "YOUR API KEY",
"url": url,
"premium": "true",
}
api_result = client.post(
"https://api.scrapingdog.com/scrape",
json=payload,
)
data = api_result.json()
assert data['success'], f"scrape failed: {data['message']}"
return Selector(data['html'])
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
importjsonfromparselimportSelector# webscrapingapi has a Python SDK but it's not great, use httpx instead:# `pip install httpx`importhttpx# create an API client instanceclient=httpx.Client(timeout=180)# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"url":url,"api_key":"YOUR API KEY",# NOTE: add your API KEY here!"timeout":60_000,"render_js":"0",},)assertapi_result.status_code==200,api_result.reason_phrasereturnSelector(api_result.text)# this example show how instagram can be scraped through their backend APIusername="google"selector=scrape(url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",headers={"x-ig-app-id":"936619743392459"},# this is needed to access IG backend API)# this returns a giant JSON dataset with all Instagram profile detailsdataset=selector.get()['data']['user']# some examples of what can be found in the dataset:frompprintimportpprintpprint(dataset){"biography":"Google unfiltered\u2014sometimes with filters.","external_url":"https://linkin.bio/google","external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP","edge_followed_by":{"count":14995780},"fbid":"17841401778116675","edge_follow":{"count":34},"full_name":"Google","business_address_json":"{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}","category_enum":"INTERNET_COMPANY","category_name":"Internetunternehmen","is_verified":true,"is_verified_by_mv4b":false,"profile_pic_url":"https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546","username":"google",...recentpostsandmuchmore}
import json
from parsel import Selector
# webscrapingapi has a Python SDK but it's not great, use httpx instead:
# `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"url": url,
"api_key": "YOUR API KEY", # NOTE: add your API KEY here!
"timeout": 60_000,
"render_js": "0",
},
)
assert api_result.status_code == 200, api_result.reason_phrase
return Selector(api_result.text)
# this example show how instagram can be scraped through their backend API
username = "google"
selector = scrape(
url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API
)
# this returns a giant JSON dataset with all Instagram profile details
dataset = selector.get()['data']['user']
# some examples of what can be found in the dataset:
from pprint import pprint
pprint(dataset)
{
"biography": "Google unfiltered\u2014sometimes with filters.",
"external_url": "https://linkin.bio/google",
"external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
"edge_followed_by": {
"count": 14995780
},
"fbid": "17841401778116675",
"edge_follow": {
"count": 34
},
"full_name": "Google",
"business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
"category_enum": "INTERNET_COMPANY",
"category_name": "Internetunternehmen",
"is_verified": true,
"is_verified_by_mv4b": false,
"profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
"username": "google",
... recent posts and much more
}
For scraping Instagram.com above we're calling it's backend GraphQl endpoint directly. This provides
the entire instagram post, comment and metadata dataset. Note we're using x-ig-app-id header
to indicate client app version to access this endpoint.
Why scrape Instagram Pages?
Web scraping Instagram.com is a popular use case for following social signals and announcements
but there are many less obvious uses like tracking e-commerce movements and AI training.
With Instagram signal monitoring scraping we can keep track certain channels for important
announcements and other signals. This data is important in market trend estimation or even stock
movement predictions as Instram is often one of the first announcement sources.
Increasingly, Instagram data scraping can also be used in Market research.
As IG is becoming a major advertisement and e-commerce hub we can track various e-commerce
data points like advertisements, product sentiment analysis and pricing.
Finally, Instagram is a vast ocean of user-generated content from brands, to artist creations
and real life photographs making it a popular target for AI trainng