How to scrape twitter.com and which web scraping API to use

X.com (formerly Twitter) is one of the biggest social networks out there and a popular web scraping target for tracking social signals and announcements.

X.com is using proprietary web scraping protection mechanisms that are constantly evolving. This makes it difficult to scrape Twitter data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for X.com at $1.67 per 1,000 scrape requests on average.

Twitter.com scraping API benchmarks

Scrapeway runs weekly benchmarks for X.com Tweets for the most popular web scraping APIs. Here's the table for this week:

	Service	Success %	Speed	Cost $/1000
1	Scrapfly	97% -2	15.4s -0.1	$1.08 +0.1
2	Scrapingdog	95% -5	1.0s +=	- -1.0
3	Zenrows	75% -4	14.3s +1.5	$6.9 =
4	WebScrapingAPI	66% =	11.6s -4.9	$1.85 -0.87
5	Scrapingant	1% -94	58.3s +39.6	$1.9 =
6	Scraperapi	0%	-	-
7	Scrapingbee	0%	-	-

Data range Nov 22 - Nov 28

How to scrape twitter.com?

X.com is relatively difficult to scrape as it's a heavy javascript application so headless browser use is required. All web scraping APIs we've tested support headless browsers however only these provide full browser control:

To add, Twitter has a lot of anti-scraping mechanisms in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

As for parsing scraped X.com data using traditional HTML parsing tools like XPath or CSS selectors is relatively easy. Twitter uses `data-test` markup extensively through out their application meaning it's very easy to parse the HTML for the data you need.

Twitter.com scraper

  import json
  from parsel import Selector
  # install using `pip install scrapfly-sdk`
    from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
  # create an API client instance
    client = ScrapflyClient(key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
    url=url,
    asp=True,
    render_js=False,
    cache=False,
    cache_ttl=900,
    url='https://x.com/SecurityTrybe/status/1974892468585021895',
    method='GET',
    
    ))
    return api_result.selector
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

  import json
  from parsel import Selector
  # scrapingdog has no integration but we can use httpx
    # install using `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
    "api_key": "YOUR API KEY",
    "url": url,
      
    }
    api_result = client.post(
    "https://api.scrapingdog.com/scrape",
    json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

  import json
  from parsel import Selector
  # install using `pip install zenrows`
    from zenrows import ZenRowsClient
  # create an API client instance
    client = ZenRowsClient(apikey="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": "True",
    "js_render": "True",
    "premium_proxy": "True",
    "wait": "5000",
    "url": "https://x.com/SecurityTrybe/status/1974892468585021895",
    "method": "GET",
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

  import json
  from parsel import Selector
  # webscrapingapi has a Python SDK but it's not great, use httpx instead:
    # `pip install httpx`
    import httpx
  # create an API client instance
    client = httpx.Client(timeout=180)
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "url": url,
    "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
    "timeout": 60_000,
    "render_js": "False",
    "url": "https://x.com/SecurityTrybe/status/1974892468585021895",
    "method": "GET",
    },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

  import json
  from parsel import Selector
  # install using `pip install scrapingant-client`
    from scrapingant_client import ScrapingAntClient
  # create an API client instance
    client = ScrapingAntClient(token="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
    url,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

  import json
  from parsel import Selector
  # install using `pip install scraperapi`
    from scraper_api import ScraperAPIClient
  # create an API client instance
    client = ScraperAPIClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url=url,
    headers=headers or {},
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

  import json
  from parsel import Selector
  # install using `pip install scrapingbee`
    from scrapingbee import ScrapingBeeClient
  # create an API client instance
    client = ScrapingBeeClient(api_key="YOUR API KEY")
  # create scrape function that returns HTML parser for a given URL
  def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
    url,
    headers=headers,
    params={
    "json_response": True,
    "transparent_status_code": True,
    }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])
  url = "https://twitter.com/XCreators/status/1770093017506189440"
  selector = scrape(url, render_js=True, country="US")
  # Twitter can be parsed using css selectors and data-testid attributes
  views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
  data = {
  "tweet": selector.css("[data-testid=tweetText] ::text").get(),
  "views": views,
  "reposts": reposts,
  "quotes": quotes,
  "likes": likes,
  "bookmarks": bookmarks,
  }
  from pprint import pprint
  pprint(data)
  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

For scraping X.com above we're using headless browser to render the twitter web app. Then, to find elements within the HTML the data-testid attributes come in handy as that is what twitter is using for their own headless test browsers.

Why scrape X.com Tweets?

X.com is a popular target for web scraping because it has a large amount of social signal data that can be used for various purposes like signal analysis and sentiment analysis.

With announcement monitoring scraping we can track certain Twitter channels for new announcements or changes in sentiment that can be used for trading or marketing purposes.

Market research can also be done by scraping Twitter data to identify trends and sentiment around certain products or services.

Another popular use case for X.com scraping is competition tracking by scraping competitor post performance and follower gains.

Finally, Twitter contains a lot of text data which can be used in AI training

Web Scraping Twitter.com Overview

Twitter.com scraping API benchmarks

How to scrape twitter.com?

Join the Scrapeway newsletter!

Why scrape X.com Tweets?