How to scrape instagram.com and which web scraping API to use

Instagram is one of the biggest social networks focusing on media sharing and public announcements making it a popular target for web scraping.

Instagram.com is using proprietary web scraping protection tech that is updated constantly. This makes it difficult to scrape Instagram at scale reliably and that's where web scraping APIs can really come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for scraping Instagram.com at $2.04 per 1,000 scrape requests on average.

Instagram.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Instagram Pages for the most popular web scraping APIs. Here's the table for this week:

	Service	Success %	Speed	Cost $/1000
1	Scrapingant	100% +2	6.3s +0.4	$4.75 =
2	Scrapfly	99% -1	3.9s -1.5	$3.54 -0.21
3	WebScrapingAPI	98% +2	4.8s -0.1	$2.71 =
4	Scrapingbee	94% +14	3.6s -0.5	$3.27 =
5	Scraperapi	0%	-	-
6	Scrapingdog	0%	-	-
7	Zenrows	0%	-	-

Data range Jul 05 - Jul 11

How to scrape instagram.com?

Instagram.com can be surprisingly complex to scrape as it's a giant web app with graphql backend. For people unfamiliar with reverse engineering using browser network inspect it's probably best to pay extra and use headless browser to fully render pages.

In this case, see web scraping API services that support full browser automation that can click on specific posts and scroll through comments. Here's a list of services that support full browser control:

That being said, Instagram can be scraped without the use of headless browsers by using it's graphql backend like this python example for user profile scraping:

Instagram.com scraper

  import json
  from parsel import Selector
  # install using `pip install scrapingant-client`
    from scrapingant_client import ScrapingAntClient
  # create an API client instance
    client = ScrapingAntClient(token="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
    url,
    browser=False,
    return_page_source=False,
    proxy_type='residential',
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # install using `pip install scrapingant-client` from scrapingant_client import ScrapingAntClient # create an API client instance client = ScrapingAntClient(token="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.general_request( url, browser=False, return_page_source=False, proxy_type='residential', ) assert api_result.ok, api_result.text return Selector(api_result.text) # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

  import json
  from parsel import Selector
  # install using `pip install scrapfly-sdk`
    from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
  # create an API client instance
    client = ScrapflyClient(key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
    url=url,
    asp=True,
    country='US',
    
    ))
    return api_result.selector
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # install using `pip install scrapfly-sdk` from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse # create an API client instance client = ScrapflyClient(key="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.scrape(ScrapeConfig( url=url, asp=True, country='US', )) return api_result.selector # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

  import json
  from parsel import Selector
  # webscrapingapi has a Python SDK but it's not great, use httpx instead:
    # `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "url": url,
    "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
    "timeout": 60_000,
    "render_js": "0",
    },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # webscrapingapi has a Python SDK but it's not great, use httpx instead: # `pip install httpx` import httpx # create an API client instance client = httpx.Client(timeout=180) # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url, headers=headers, params={ "url": url, "api_key": "YOUR API KEY", # NOTE: add your API KEY here! "timeout": 60_000, "render_js": "0", }, ) assert api_result.status_code == 200, api_result.reason_phrase return Selector(api_result.text) # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

  import json
  from parsel import Selector
  # install using `pip install scrapingbee`
    from scrapingbee import ScrapingBeeClient
  # create an API client instance
    client = ScrapingBeeClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": True,
    "transparent_status_code": True,
    "premium_proxy": "True",
    "render_js": "False",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # install using `pip install scrapingbee` from scrapingbee import ScrapingBeeClient # create an API client instance client = ScrapingBeeClient(api_key="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url, headers=headers, params={ "json_response": True, "transparent_status_code": True, "premium_proxy": "True", "render_js": "False", } ) assert api_result.ok, api_result.text data = api_result.json() return Selector(data['body']) # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

  import json
  from parsel import Selector
  # install using `pip install scraperapi`
    from scraper_api import ScraperAPIClient
  # create an API client instance
    client = ScraperAPIClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url=url,
    headers=headers or {},
    premium=True,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # install using `pip install scraperapi` from scraper_api import ScraperAPIClient # create an API client instance client = ScraperAPIClient(api_key="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url=url, headers=headers or {}, premium=True, ) assert api_result.ok, api_result.text return Selector(api_result.text) # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

  import json
  from parsel import Selector
  # scrapingdog has no integration but we can use httpx
    # install using `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
    "api_key": "YOUR API KEY",
    "url": url,
      "premium": "true",
      
    }
    api_result = client.post(
    "https://api.scrapingdog.com/scrape",
    json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # scrapingdog has no integration but we can use httpx # install using `pip install httpx` import httpx # create an API client instance client = httpx.Client(timeout=180) # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: payload = { "api_key": "YOUR API KEY", "url": url, "premium": "true", } api_result = client.post( "https://api.scrapingdog.com/scrape", json=payload, ) data = api_result.json() assert data['success'], f"scrape failed: {data['message']}" return Selector(data['html']) # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

  import json
  from parsel import Selector
  # install using `pip install zenrows`
    from zenrows import ZenRowsClient
  # create an API client instance
    client = ZenRowsClient(apikey="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": "True",
    "premium_proxy": "True",
    "js_render": "False",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])
  # this example show how instagram can be scraped through their backend API
  username = "google"
  selector = scrape(
  url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
  headers={"x-ig-app-id": "936619743392459"},  # this is needed to access IG backend API
  )
  # this returns a giant JSON dataset with all Instagram profile details
  dataset = selector.get()['data']['user']
  # some examples of what can be found in the dataset:
  from pprint import pprint
  pprint(dataset)
  {
  "biography": "Google unfiltered\u2014sometimes with filters.",
  "external_url": "https://linkin.bio/google",
  "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP",
  "edge_followed_by": {
  "count": 14995780
  },
  "fbid": "17841401778116675",
  "edge_follow": {
  "count": 34
  },
  "full_name": "Google",
  "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}",
  "category_enum": "INTERNET_COMPANY",
  "category_name": "Internetunternehmen",
  "is_verified": true,
  "is_verified_by_mv4b": false,
  "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546",
  "username": "google",
  ... recent posts and much more
  }

import json from parsel import Selector # install using `pip install zenrows` from zenrows import ZenRowsClient # create an API client instance client = ZenRowsClient(apikey="YOUR API KEY") # create scrape function that returns HTML parser for a given URL def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector: api_result = client.get( url, headers=headers, params={ "json_response": "True", "premium_proxy": "True", "js_render": "False", } ) assert api_result.ok, api_result.text data = api_result.json() return Selector(data['html']) # this example show how instagram can be scraped through their backend API username = "google" selector = scrape( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, # this is needed to access IG backend API ) # this returns a giant JSON dataset with all Instagram profile details dataset = selector.get()['data']['user'] # some examples of what can be found in the dataset: from pprint import pprint pprint(dataset) { "biography": "Google unfiltered\u2014sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=AT1fNmnZFR3WyD72UwTTj1nHk-vS6oGaZH57Gwqfq6nj35T_1H3nVcQIphay4l-3qTHSrrBm0QqrKi7TWRjhQEVXLB0VTNYeLNuD_zP-FVr9BOxP", "edge_followed_by": { "count": 14995780 }, "fbid": "17841401778116675", "edge_follow": { "count": 34 }, "full_name": "Google", "business_address_json": "{\"city_name\": \"Mountain View, California\", \"city_id\": 108212625870265, \"latitude\": 37.4221, \"longitude\": -122.08432, \"street_address\": \"1600 Amphitheatre Pkwy\", \"zip_code\": \"94043\"}", "category_enum": "INTERNET_COMPANY", "category_name": "Internetunternehmen", "is_verified": true, "is_verified_by_mv4b": false, "profile_pic_url": "https://instagram.fdtm2-2.fna.fbcdn.net/v/t51.2885-19/425391724_2454717941393726_7200817596193793590_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.fdtm2-2.fna.fbcdn.net&_nc_cat=1&_nc_ohc=p9yxdQR7qOMAb6AQvYj&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfBkva0psR8QgUzFcA0TqYyQNcqt5qBLrxASA62nS2umiw&oe=661EA69C&_nc_sid=8b3546", "username": "google", ... recent posts and much more }

For scraping Instagram.com above we're calling it's backend GraphQl endpoint directly. This provides the entire instagram post, comment and metadata dataset. Note we're using x-ig-app-id header to indicate client app version to access this endpoint.

Why scrape Instagram Pages?

Web scraping Instagram.com is a popular use case for following social signals and announcements but there are many less obvious uses like tracking e-commerce movements and AI training.

With Instagram signal monitoring scraping we can keep track certain channels for important announcements and other signals. This data is important in market trend estimation or even stock movement predictions as Instram is often one of the first announcement sources.

Increasingly, Instagram data scraping can also be used in Market research. As IG is becoming a major advertisement and e-commerce hub we can track various e-commerce data points like advertisements, product sentiment analysis and pricing.

Finally, Instagram is a vast ocean of user-generated content from brands, to artist creations and real life photographs making it a popular target for AI trainng

Web Scraping Instagram.com Overview

Instagram.com scraping API benchmarks

How to scrape instagram.com?

Join the Scrapeway newsletter!

Why scrape Instagram Pages?