Web scraping twitter.com

Last updated: 2024-04-08

X.com (formerly Twitter) is one of the biggest social networks out there and a popular web scraping target for tracking social signals and announcements.

X.com is using proprietary web scraping protection mechanisms that are constantly evolving. This makes it difficult to scrape Twitter data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for X.com at $1.64 per 1,000 scrape requests on average.

Twitter.com scraping API benchmarks

Scrapeway runs bi-weekly benchmarks for X.com Tweets against the most popular web scraping APIs. Here's the ranking for this period:

Web scraping API benchmark for twitter.com — success rate, speed, cost per 1,000 requests. Data: 2026-06-13 to 2026-06-19.
#	Service	Success	Speed	Cost/1k
1 🥇	Scrapfly	98% =	30.9s +15.6	$0.61 -0.29	(237) ★ 4.9
2 🥈	Scrapingdog	95% -1	1.2s -0.2	$1.0 =	—
3 🥉	Zenrows	49% -29	11.8s +0.1	$6.9 =	(103) ★ 4.8
4	WebScrapingAPI	49% -43	14.6s -1.5	$2.71 =	—
5	Scrapingant	38% -52	9.8s -0.9	$1.9 =	—
6	Scrapingbee	0% —	— —	— —	(137) ★ 4.9
7	Firecrawl	0% —	— —	— —	—
8	Scraperapi	0% —	— —	— —	(62) ★ 4.6

Data range Jun 13 – Jun 19

All Benchmarks →

How to scrape twitter.com?

X.com is relatively difficult to scrape as it's a heavy javascript application so headless browser use is required.

To add, Twitter has a lot of anti-scraping mechanisms in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

As for parsing scraped X.com data using traditional HTML parsing tools like XPath or CSS selectors is relatively easy. Twitter uses `data-test` markup extensively through out their application meaning it's very easy to parse the HTML for the data you need.

Code example

twitter_scraper.py

import json
from parsel import Selector

# install using `pip install scrapfly-sdk`
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse

# create an API client instance
client = ScrapflyClient(key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.scrape(ScrapeConfig(
            url=url,

    ))
    return api_result.selector

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx

# create an API client instance
client = httpx.Client(timeout=180)

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    payload = {
        "api_key": "YOUR API KEY",
        "url": url,
        "api_url": "https://api.scrapingdog.com/x/post",
        "parsed": "False",
        "tweetId": "2036057988637594058",

    }
    api_result = client.post(
        "https://api.scrapingdog.com/scrape",
        json=payload,
    )
    data = api_result.json()
    assert data['success'], f"scrape failed: {data['message']}"
    return Selector(data['html'])

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# install using `pip install zenrows`
from zenrows import ZenRowsClient

# create an API client instance
client = ZenRowsClient(apikey="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url,
        headers=headers,
        params={
            "json_response": "True",
        }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['html'])

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# webscrapingapi has a Python SDK but it's not great, use httpx instead:
# `pip install httpx`
import httpx

# create an API client instance
client = httpx.Client(timeout=180)

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url,
        headers=headers,
        params={
            "url": url,
            "api_key": "YOUR API KEY",  # NOTE: add your API KEY here!
            "timeout": 60_000,
        },
    )
    assert api_result.status_code == 200, api_result.reason_phrase
    return Selector(api_result.text)

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# install using `pip install scrapingant-client`
from scrapingant_client import ScrapingAntClient

# create an API client instance
client = ScrapingAntClient(token="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
        url,
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# install using `pip install scrapingbee`
from scrapingbee import ScrapingBeeClient

# create an API client instance
client = ScrapingBeeClient(api_key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url,
        headers=headers,
        params={
            "json_response": True,
            "transparent_status_code": True,
        }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# create an API client instance
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    url = "https://twitter.com/XCreators/status/1770093017506189440"
    selector = scrape(url, render_js=True, country="US")
    # Twitter can be parsed using css selectors and data-testid attributes
    views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
    data = {
        "tweet": selector.css("[data-testid=tweetText] ::text").get(),
        "views": views,
        "reposts": reposts,
        "quotes": quotes,
        "likes": likes,
        "bookmarks": bookmarks,
    }
    from pprint import pprint
    pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

import json
from parsel import Selector

# install using `pip install scraperapi`
from scraper_api import ScraperAPIClient

# create an API client instance
client = ScraperAPIClient(api_key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url=url,
        headers=headers or {},
    )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)

url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")

# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
    "tweet": selector.css("[data-testid=tweetText] ::text").get(),
    "views": views,
    "reposts": reposts,
    "quotes": quotes,
    "likes": likes,
    "bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)

Output $ python twitter_scraper.py

  {'bookmarks': '44',
  'likes': '725',
  'quotes': '24',
  'reposts': '127',
  'tweet': 'X is the platform for content creators to freely express their '
  'artistic and diverse perspectives without the constraints of '
  'censorship. Since the introduction of our ad revenue share program, '
  'X has paid out an impressive sum of more than $45 million to more '
  'than 150,000 creators.',
  'views': '530.8K'}

For scraping X.com above we're using headless browser to render the twitter web app. Then, to find elements within the HTML the data-testid attributes come in handy as that is what twitter is using for their own headless test browsers.

Why scrape X.com Tweets?

X.com is a popular target for web scraping because it has a large amount of social signal data that can be used for various purposes like signal analysis and sentiment analysis.

With announcement monitoring scraping we can track certain Twitter channels for new announcements or changes in sentiment that can be used for trading or marketing purposes.

Market research can also be done by scraping Twitter data to identify trends and sentiment around certain products or services.

Another popular use case for X.com scraping is competition tracking by scraping competitor post performance and follower gains.

Finally, Twitter contains a lot of text data which can be used in AI training