Web Scraping Walmart.com Overview

2024-04-08

Walmart is one of the biggest e-commerce retailers in the United States containing product data of brick and mortar stores as well as online stores.

Walmart is using proprietary web scraping protection mechanisms that are constantly evolving. This makes it difficult to scrape Walmart data reliably and this is where web scraping APIs come in handy.

Overall, most web scraping APIs we've tested through our benchmarks perform well for Walmart at $3.34 per 1,000 scrape requests on average.

Walmart.com scraping API benchmarks

Scrapeway runs weekly benchmarks for Walmart Products for the most popular web scraping APIs. Here's the table for this week:

Service Success % Speed Cost $/1000
1
100%
=
5.6s
-0.9
$6.9
=
2
97%
-1
8.3s
+1.3
$2.45
=
3
97%
-3
14.2s
+5.4
$4.09
-0.08
4
90%
+44
45.8s
+3.1
$1.9
=
5
80%
-1
30.2s
-0.7
$2.55
+0.28
6
39%
-8
2.1s
-0.2
$3.27
=
7
1%
1.2s
$2.2
Data range Nov 01 - Nov 08

How to scrape walmart.com?

Walmart is relatively easy to scrape as it's mostly static content with a few dynamic elements so headless browser use is not required.

That being said, Walmart has a lot of anti-scraping mechanisms in place, so it's recommended to use a reliable web scraping service that can bypass the constantly changing anti-scraping measures. See benchmarks for the most up-to-date results.

Walmart's HTML datasets can be difficult to parse just because of sheer data point scale however many of the datapoints can be accessed through NextJS framework variables walmart is using. To do this look for the __NEXT_DATA__ variable in the HTML source.

Walmart.com scraper
import json
from parsel import Selector
# install using `pip install scrapingbee`
from scrapingbee import ScrapingBeeClient

# create an API client instance
client = ScrapingBeeClient(api_key="YOUR API KEY")

# create scrape function that returns HTML parser for a given URL
def scrape(url: str, country: str="", render_js=False, headers: dict=None) -> Selector:
    api_result = client.get(
        url, 
        headers=headers,
        params={
            "json_response": True,
            "transparent_status_code": True,
            "premium_proxy": "True",
            "country_code": "US",
            "render_js": "False",
            }
    )
    assert api_result.ok, api_result.text
    data = api_result.json()
    return Selector(data['body'])

url = "https://www.walmart.com/ip/Apple-MacBook-Air-13-3-inch-Laptop-Space-Gray-M1-Chip-8GB-RAM-256GB-storage/609040889"
selector = scrape(url)
# Walmart is using NextJS framework so the product data is stored in a JSON variable
data = selector.xpath('//script[@id="__NEXT_DATA__"]/text()').get()
data = json.loads(data)
product = data["props"]["pageProps"]["initialData"]["data"]["product"]

# the resulting dataset is pretty big but here are some example fields:
from pprint import pprint
pprint(product)
{
  "id": "4SZSM8SXAAJT",
  "name": "Apple MacBook Air 13.3 inch Laptop - Space Gray, M1 Chip, 8GB RAM, 256GB storage",
  "shortDescription": "Introducing The 13-inch MacBook Air with the Apple M1 chip is incredibly thin and light with a silent fanless design. It delivers remarkable performance and up to 18 hours of battery life. And it has a beautiful Retina display for super sharp text and vibrant colors. Amazing performance, Unbeatable price. It's a laptop you’re going to love!",
  "additionalOfferCount": 2,
  "availabilityStatus": "IN_STOCK",
  "averageRating": 4.7,
  "associatedBundleId": null,
  "suppressReviews": false,
  "brand": "Apple",
  "productTypeId": "710",
  "model": "MGN63LL/A",
  "buyNowEligible": true,
  "fulfillmentType": "FC",
  "fulfillmentBadge": "Tomorrow",
  "checkStoreAvailabilityATC": false,
  "checkAvailabilityGlobalDFS": false,
  "hasSellerBadge": null,
  "hasCarePlans": true,
  "hasHomeServices": null,
  "itemType": null,
  "primaryUsItemId": "609040889",
  "conditionType": "New",
  "imageInfo": {
    "allImages": [
      {
        "id": "0D4F1BA24DB24A7F89FA742D2A069922",
        "url": "https://i5.walmartimages.com/seo/Apple-MacBook-Air-13-3-inch-Laptop-Space-Gray-M1-Chip-8GB-RAM-256GB-storage_af1d4133-6de9-4bdc-b1c6-1ca8bd0af7a0.c0eb74c31b2cb05df4ed11124d0e255b.jpeg",
        "zoomable": true
      },
      "...truncated...",
    ],
  },
  "priceInfo": {
    "currentPrice": {
      "price": 699,
      "priceString": "$699.00",
      "variantPriceString": "$699.00",
      "currencyUnit": "USD",
      "bestValue": null,
      "priceDisplay": "$699.00"
    },
  "...truncated..."

For scraping walmart.com above we're using HTML scraping and extract a JSON variable that contains the product data. This variable can be found under __APP_DATA__ in the HTML source.

Join the Scrapeway newsletter!

Early benchmark reports and industry insights every week!

Why scrape Walmart Products?

Walmart is a popular target for web scraping as it contains a massive e-commerce dataset that can be used for various purposes lik price monitoring, market research, and competitive analysis.

With price monitoring scraping we can keep track of the product's historic pricing data and take advantage of market fluctuations to make better purchasing decisions or investments.

Market research scraping, and especially Walmart review scraping, can help with understanding customer preferences through sentiment analysis, identify trends through statistics, and make informed decisions about new product development and marketing strategies.

Walmart is also often scraped by Walmart partners to monitor brand awareness and performance and adjust their negotiation strategies.

Finally, Walmart contains so much data that it can be used in AI model training.