X.com (formerly Twitter) is one of the biggest social networks out there and a
popular web scraping target for tracking social signals and announcements.
X.com is using proprietary web scraping protection mechanisms that are constantly evolving.
This makes it difficult to scrape Twitter data reliably and this is where web scraping APIs come in handy.
Overall, most web scraping APIs we've tested through our benchmarks
perform well for X.com at $1.69 per 1,000 scrape requests on average.
Twitter.com scraping API benchmarks
Scrapeway runs bi-weekly benchmarks for X.com Tweets against the most popular web scraping APIs. Here's the ranking for this period:
Web scraping API benchmark for twitter.com — success rate, speed, cost per 1,000 requests. Data: 2026-05-02 to 2026-05-08.
X.com is relatively difficult to scrape as it's a heavy javascript application
so headless browser use is required.
To add, Twitter has a lot of anti-scraping mechanisms in place, so it's recommended to use
a reliable web scraping service that can bypass the constantly changing anti-scraping measures.
See benchmarks for the most up-to-date results.
As for parsing scraped X.com data using traditional HTML parsing tools like XPath or CSS selectors
is relatively easy. Twitter uses `data-test` markup extensively through out their application meaning
it's very easy to parse the HTML for the data you need.
Code example
twitter_scraper.py
importjsonfromparselimportSelector# install using `pip install scrapfly-sdk`fromscrapflyimportScrapflyClient,ScrapeConfig,ScrapeApiResponse# create an API client instanceclient=ScrapflyClient(key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.scrape(ScrapeConfig(url=url,asp=True,render_js=False,cache=False,cache_ttl=900,country='us',rendering_stage='domcontentloaded',url='https://x.com/CLondoner92/status/2048089719502786834',method='GET',))returnapi_result.selectorurl="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# install using `pip install scrapfly-sdk`
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
# create an API client instance
client = ScrapflyClient(key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.scrape(ScrapeConfig(
url=url,
asp=True,
render_js=False,
cache=False,
cache_ttl=900,
country='us',
rendering_stage='domcontentloaded',
url='https://x.com/CLondoner92/status/2048089719502786834',
method='GET',
))
return api_result.selector
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# scrapingdog has no integration but we can use httpx# install using `pip install httpx`importhttpx# create an API client instanceclient=httpx.Client(timeout=180)# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:payload={"api_key":"YOUR API KEY","url":url,"api_url":"https://api.scrapingdog.com/x/post","parsed":"False","tweetId":"2048089719502786834",}api_result=client.post("https://api.scrapingdog.com/scrape",json=payload,)data=api_result.json()assertdata['success'],f"scrape failed: {data['message']}"returnSelector(data['html'])url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# scrapingdog has no integration but we can use httpx
# install using `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
payload = {
"api_key": "YOUR API KEY",
"url": url,
"api_url": "https://api.scrapingdog.com/x/post",
"parsed": "False",
"tweetId": "2048089719502786834",
}
api_result = client.post(
"https://api.scrapingdog.com/scrape",
json=payload,
)
data = api_result.json()
assert data['success'], f"scrape failed: {data['message']}"
return Selector(data['html'])
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# webscrapingapi has a Python SDK but it's not great, use httpx instead:# `pip install httpx`importhttpx# create an API client instanceclient=httpx.Client(timeout=180)# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"url":url,"api_key":"YOUR API KEY",# NOTE: add your API KEY here!"timeout":60_000,"render_js":"False","url":"https://x.com/CLondoner92/status/2048089719502786834","method":"GET",},)assertapi_result.status_code==200,api_result.reason_phrasereturnSelector(api_result.text)url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# webscrapingapi has a Python SDK but it's not great, use httpx instead:
# `pip install httpx`
import httpx
# create an API client instance
client = httpx.Client(timeout=180)
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"url": url,
"api_key": "YOUR API KEY", # NOTE: add your API KEY here!
"timeout": 60_000,
"render_js": "False",
"url": "https://x.com/CLondoner92/status/2048089719502786834",
"method": "GET",
},
)
assert api_result.status_code == 200, api_result.reason_phrase
return Selector(api_result.text)
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# install using `pip install zenrows`fromzenrowsimportZenRowsClient# create an API client instanceclient=ZenRowsClient(apikey="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"json_response":"True","js_render":"True","premium_proxy":"True","wait":"5000","url":"https://x.com/CLondoner92/status/2048089719502786834","method":"GET",})assertapi_result.ok,api_result.textdata=api_result.json()returnSelector(data['html'])url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# install using `pip install zenrows`
from zenrows import ZenRowsClient
# create an API client instance
client = ZenRowsClient(apikey="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"json_response": "True",
"js_render": "True",
"premium_proxy": "True",
"wait": "5000",
"url": "https://x.com/CLondoner92/status/2048089719502786834",
"method": "GET",
}
)
assert api_result.ok, api_result.text
data = api_result.json()
return Selector(data['html'])
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# install using `pip install scrapingant-client`fromscrapingant_clientimportScrapingAntClient# create an API client instanceclient=ScrapingAntClient(token="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.general_request(url,json=True,url='https://x.com/MSFT365Status/status/2025960695645229456',method='GET',)assertapi_result.ok,api_result.textreturnSelector(api_result.text)url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# install using `pip install scrapingant-client`
from scrapingant_client import ScrapingAntClient
# create an API client instance
client = ScrapingAntClient(token="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.general_request(
url,
json=True,
url='https://x.com/MSFT365Status/status/2025960695645229456',
method='GET',
)
assert api_result.ok, api_result.text
return Selector(api_result.text)
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# install using `pip install scrapingbee`fromscrapingbeeimportScrapingBeeClient# create an API client instanceclient=ScrapingBeeClient(api_key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url,headers=headers,params={"json_response":True,"transparent_status_code":True,})assertapi_result.ok,api_result.textdata=api_result.json()returnSelector(data['body'])url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# install using `pip install scrapingbee`
from scrapingbee import ScrapingBeeClient
# create an API client instance
client = ScrapingBeeClient(api_key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url,
headers=headers,
params={
"json_response": True,
"transparent_status_code": True,
}
)
assert api_result.ok, api_result.text
data = api_result.json()
return Selector(data['body'])
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# install using `pip install scraperapi`fromscraper_apiimportScraperAPIClient# create an API client instanceclient=ScraperAPIClient(api_key="YOUR API KEY")# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:api_result=client.get(url=url,headers=headersor{},)assertapi_result.ok,api_result.textreturnSelector(api_result.text)url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# install using `pip install scraperapi`
from scraper_api import ScraperAPIClient
# create an API client instance
client = ScraperAPIClient(api_key="YOUR API KEY")
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
api_result = client.get(
url=url,
headers=headers or {},
)
assert api_result.ok, api_result.text
return Selector(api_result.text)
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
importjsonfromparselimportSelector# create an API client instance# create scrape function that returns HTML parser for a given URLdefscrape(url:str,country:str="",render_js=False,headers:dict=None)->Selector:url="https://twitter.com/XCreators/status/1770093017506189440"selector=scrape(url,render_js=True,country="US")# Twitter can be parsed using css selectors and data-testid attributesviews,reposts,quotes,likes,bookmarks,*_=selector.css('[data-testid=app-text-transition-container] span::text').getall()data={"tweet":selector.css("[data-testid=tweetText] ::text").get(),"views":views,"reposts":reposts,"quotes":quotes,"likes":likes,"bookmarks":bookmarks,}frompprintimportpprintpprint(data)
Output$ python twitter_scraper.py
{'bookmarks': '44',
'likes': '725',
'quotes': '24',
'reposts': '127',
'tweet': 'X is the platform for content creators to freely express their '
'artistic and diverse perspectives without the constraints of '
'censorship. Since the introduction of our ad revenue share program, '
'X has paid out an impressive sum of more than $45 million to more '
'than 150,000 creators.',
'views': '530.8K'}
import json
from parsel import Selector
# create an API client instance
# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
url = "https://twitter.com/XCreators/status/1770093017506189440"
selector = scrape(url, render_js=True, country="US")
# Twitter can be parsed using css selectors and data-testid attributes
views, reposts, quotes, likes, bookmarks, *_ = selector.css('[data-testid=app-text-transition-container] span::text').getall()
data = {
"tweet": selector.css("[data-testid=tweetText] ::text").get(),
"views": views,
"reposts": reposts,
"quotes": quotes,
"likes": likes,
"bookmarks": bookmarks,
}
from pprint import pprint
pprint(data)
For scraping X.com above we're using headless browser to render the twitter web app.
Then, to find elements within the HTML the data-testid attributes come in handy as that
is what twitter is using for their own headless test browsers.
Why scrape X.com Tweets?
X.com is a popular target for web scraping because it has a large amount of
social signal data that can be used for various purposes like signal analysis and sentiment analysis.
With announcement monitoring scraping we can track certain Twitter channels for new announcements
or changes in sentiment that can be used for trading or marketing purposes.
Market research can also be done by scraping Twitter data to identify trends and
sentiment around certain products or services.
Another popular use case for X.com scraping is competition tracking by scraping competitor
post performance and follower gains.
Finally, Twitter contains a lot of text data which can be used in AI training