Web Scraping Linkedin.com Overview

2024-04-08

Linkedin is by far the biggest career focused social network and thus contains incredibly amounts of job related data. From job listings to company profiles and public CVs of individuals - all very popular web scraping targets.

Linkedin.com is using its own proprietary web scraping protection technology that is being constantly updated and is one of the toughest to bypass. This makes it difficult to scrape Linkedin pages and this is where web scraping API value trully shows.

Overall, not many web scraping APIs we've tested through our benchmarks are able to scrape LinkedIn reliably and those which can set the average price at $11.48 per 1,000 scrape requests on average.

Linkedin.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Linkedin public profiles for the most popular web scraping APIs. Here's the table for this week:

Service Success % Speed Cost $/1000
1
92%
-8
33.5s
-0.2
$8.0
+0.03
2
85%
+6
23.4s
-0.9
$2.71
=
3
71%
-15
14.3s
-2.7
$14.7
=
4
40%
-2
3.3s
+0.1
$4.75
=
5
40%
-2
9.3s
+3.8
$6.9
=
6
37%
-3
2.9s
+0.4
$3.27
=
7
0%
9.7s
-12.0
$40.0
=
Data range Jan 10 - Jan 17

How to scrape linkedin.com?

With the anti-bot bypass provided by web scraping APIs Linkedin.com is not very difficult to scrape. Most of LinkedIn content is static thus headless browser is not required to scrape LinkedIn effectively. See benchmarks for the most up-to-date results.

LinkedIn's HTML pages are well structured so all of it can be easily parsed using traditional HTML parsing tools like XPath or CSS selectors. To add, big chunk of the dataset is also available through json ld microdata that is embedded in the HTML page.

Linkedin.com scraper
import json
from parsel import Selector
# install using `pip install scrapingant-client`
from scrapingant_client import ScrapingAntClient

# create an API client instance
client = ScrapingAntClient(token="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.general_request(
        url, 
        proxy_type='residential',
        browser=False,
        return_page_source=False,
        )
    assert api_result.ok, api_result.text
    return Selector(api_result.text)

url = "https://www.linkedin.com/in/adammgrant"
selector = scrape(url)

# big chunk of the dataset can be found in microdata markup:
data = json.loads(selector.xpath("//script[@type='application/ld+json']/text()").get())

# the resulting dataset is pretty big but here are some example fields:
person_data = next(d for d in data['@graph'] if d['@type'] == "Person")
from pprint import pprint
pprint(person_data)
{'@type': 'Person',
 'address': {'@type': 'PostalAddress',
             'addressCountry': 'US',
             'addressLocality': 'Filadelfia, Pennsylvania, Estados Unidos'},
 'alumniOf': [{'@type': 'EducationalOrganization',
               'member': {'@type': 'OrganizationRole',
                          'endDate': 2003,
                          'startDate': 1999},
               'name': 'Harvard University',
               'url': 'https://www.linkedin.com/school/harvard-university/'},
              {'@type': 'EducationalOrganization',
               'member': {'@type': 'OrganizationRole',
                          'endDate': 2006,
                          'startDate': 2003},
               'name': '********** ** ********'}],
 'awards': ['100 Most Creative People in Business',
            'Class of 1984 Teaching Award',
            '#1 New York Times bestseller',
            'Thinkers 50 Most Influential Management Thinkers',
            'Class of 1984 Teaching Award',
            'World Economic Forum Young Global Leader',
            'Fellow, Martin Prosperity Institute',
            "HR's Most Influential International Thinkers",
            '"Goes Above and Beyond the Call of Duty" MBA Teaching Award',
            'Class of 1984 Teaching Award',
            'Excellence in Teaching Award',
            'Excellence in Teaching Award',
            'Forbes Most Dynamic Social Innovation Initiatives of 2013',
            'Harvard Business Review Ideas that Shaped Management',
            'Wall Street Journal Favorite Books of 2013',
            'Washington Post Books Every Leader Should Read',
            'Amazon Best Books of the Year',
            'Financial Times Books of the Year',
            'Inc. Best Books of 2013 for Entrepreneurs',
            "Fortune's Five Must-Read Business Books",
            'Oprah Magazine 15 riveting reads to pick up in May',
            'Class of 1984 Teaching Award',
            'Excellence in Teaching Award, MBA Curriculum',
            'New York Times bestseller',
            'Wall Street Journal bestseller',
            '“Goes Above and Beyond the Call of Duty” MBA Teaching Award',
            'BusinessWeek Favorite Professors',
            'Class of 1984 Teaching Award',
            '“Goes Above and Beyond the Call of Duty” MBA Teaching Award',
            'Excellence in Teaching Award, MBA Elective Curriculum',
            'Cummings Scholarly Achievement Award',
            'Distinguished Scientific Award for Early Career Contribution to '
            'Applied Psychology',
            'Distinguished Early Career Contributions Award – Science',
            'Excellence in Teaching Award, MBA Core Curriculum',
            'Excellence in Teaching Award, Undergraduate Division',
            '“Goes Above and Beyond the Call of Duty” MBA Teaching Award',
            'World’s 40 Best Business School Professors Under 40',
            'Owens Scholarly Achievement Award, Best Publication in I/O '
            'Psychology',
            'Excellence in Teaching Award, MBA Elective Curriculum',
            'Excellence in Teaching Award, Undergraduate Division',
            'MBA Teaching All-Star',
            'Tanner Award for Excellence in Undergraduate Teaching',
            'Rensis Likert Prize, Best Paper from a Dissertation in '
            'Organization Studies',
            'Weatherspoon Award for Excellence in Undergraduate Teaching',
            'Best Published Scholarly Article',
            'Early Research Award, Applied Science',
            'Graduate Research Fellowship',
            'Junior Fellow',
            'Manager of the Year'],
 'description': "Recognized as Wharton's top-rated professor, and one of the "
                "world's 10 most influential…",
 'disambiguatingDescription': 'Creator, Top Voice',
 'image': {'@type': 'ImageObject',
           'contentUrl': 'https://media.licdn.com/dms/image/C4E03AQFGdrbBw3FYhA/profile-displayphoto-shrink_200_200/0/1629123595757?e=2147483647&v=beta&t=D7WsVKwVonUGSGEJKzoEzdJiKBwDKx2zVmkm66I3rCM'},
 'interactionStatistic': {'@type': 'InteractionCounter',
                          'interactionType': 'https://schema.org/FollowAction',
                          'name': 'Follows',
                          'userInteractionCount': 5310073},
 'jobTitle': ['******, ****** *********, ***** *****, **** *** ****, '
              '*********, ****** *',
              '*** **** *. ********* ********* ** ********** *** **********',
              '************ **-** ******',
              '******* *******',
              '********* ********* ** **********, **** ******',
              '********* ******',
              '********* ********* ** **********',
              '********* ********* ** ************** ********',
              '******** ** ********* & ******** ***********',
              '******** ** *********** *****'],
 'knowsLanguage': [{'@type': 'Language', 'name': 'English'},
                   {'@type': 'Language', 'name': 'Spanish'}],
 'memberOf': [],
 'name': 'Adam Grant',
 'sameAs': 'https://www.linkedin.com/in/adammgrant',
 'url': 'https://www.linkedin.com/in/adammgrant',
 'worksFor': [{'@type': 'Organization',
               'member': {'@type': 'OrganizationRole'},
               'name': 'Penguin Publishing Group',
               'url': 'https://www.linkedin.com/company/penguin-group-usa'},
              {'@type': 'Organization',
               'location': 'Philadelphia, PA',
               'member': {'@type': 'OrganizationRole'},
               'name': '*** ******* ******'},
              {'@type': 'Organization',
               'member': {'@type': 'OrganizationRole'},
               'name': '*** *** **** *****'},
              {'@type': 'Organization',
               'member': {'@type': 'OrganizationRole'},
               'name': '********** ******** ******'},
              {'@type': 'Organization',
               'location': 'Philadelphia, PA',
               'member': {'@type': 'OrganizationRole'},
               'name': '*** ******* ******'},
              {'@type': 'Organization',
               'member': {'@type': 'OrganizationRole'},
               'name': '******* ** ********** *******'},
              {'@type': 'Organization',
               'location': 'Philadelphia, PA',
               'member': {'@type': 'OrganizationRole'},
               'name': '*** ******* ******'},
              {'@type': 'Organization',
               'location': 'Chapel Hill, NC',
               'member': {'@type': 'OrganizationRole'},
               'name': '********** ** ***** ******** ** ****** ****'},
              {'@type': 'Organization',
               'location': 'Cambridge, MA',
               'member': {'@type': 'OrganizationRole'},
               'name': "***'* ** ************"},
              {'@type': 'Organization',
               'location': 'Cambridge, MA',
               'member': {'@type': 'OrganizationRole'},
               'name': "***'* ** ************"}]}

For scraping Linkedin.com above we're scraping any public HTML endpoint like Linkedin personal page or company profile. Then, the majority of the dataset can be found in json ld microdata in the script HTML. The other fields are present in the page HTML and can be extracted using CSS or XPath in the same manner.

Join the Scrapeway newsletter!

Early benchmark reports and industry insights every week!

Why scrape Linkedin public profiles?

Web scraping Linkedin.com has many real world and commercial applications mostly related to brand awareness and employement industries.

With job monitoring scraping we can keep track LinkedIn job listings and how they change over time giving insights to market trends. By scraping Linkedin.com job search we can also aggregate employment data of specific regions and mediums. e.g. scraping "Typescript Developers in San Francisco" we can keep track of Typescript opportunities in one particular area and how it all changes over time.

Linkedin data scraping can also be used in Market research. Linkedin contains not only job listing details but comprehensive company profile pages that in combination can be used to create reliable market datasets.

Linkedin.com is often scraped by recruiters who list their own job listings on the platform for competitive analysis as it can help to optimize job listings to the current market trends.

Just like other social networks LinkedIn is scraped for company posts and announcements for signal tracking be it for research or stock market predictions.

Finally, LinkedIn contains a lot of user-generated content like post and reviews, which can be used for sentiment analysis and reputation management as well as AI training.