Disclaimer : Real Data API only extracts publicly available data while maintaining a strict policy against collecting any personal or identity-related information.
With our Walmart Product Data Scraper, you can quickly gather important product information, such as their descriptions, images, feedback, questions, prices, and shipping details. You can customize your search by selecting your preferred country, language, and region for shipping. Our options include popular countries like Australia, Canada, Germany, France, Singapore, the USA, the UK, UAE, and India.
Since Walmart doesn't have an official API, this scraper must help you scrape Walmart data using it.
The Walmart product data scraper gives you the following advantages.
Don't think much when you get a different product than the browsed one. Walmart is ordering products with a little variety for every buyer.
This Walmart Scraper is under development. You can contact us immediately if you face any issues or have any feature requests.
Check out the below video to learn how this scraper works.
Here is the link to watch the output video.
Here is the link to watch the output video
It would help if you gave JSON input to the Walmart scraper containing page lists with the following fields.
Field | Type | Description |
---|---|---|
startUrls | Array | In this option, you must provide only product detail, category detail, or search URLs. |
maxItems | Integer | You can restrict extracted products. It will be helpful when you explore significant subcategories on Walmart. |
endPage | Integer | Final page count you wish to extract with the infinite default value. You can apply this to each list request. |
search | String | You can scrape keywords from the Walmart search engine in this option. |
proxy | Object | Proxy configuration |
extendOutputFunction | String | This option takes the JQuery handle function as an argument and reflects data objects. |
outputFilterFunction | String | This option takes the result item as an argument and reflects mapped data. |
It would help if you used any proxy servers to use this solution. You have multiple sources to choose proxies like your own or Real Data API proxies.
Note that for protecting data API returns all the possible results. It suggests you always use outputFilterFunction.
When you wish to filter against category links, visit Walmart, apply filters over the product category, and copy-paste the URL as startUrl.
If you wish to extract only the first Walmart page of the category or search list, place the link for pages and keep endPage as 1.
With the above approach, you can also retrieve any page intervals. If you feed the 7th page of a Walmart category and decide the endPage factor as 8, you will get only the seventh and eighth page.
The Walmart Scraper uses this function to map output information that the API scrapes from Walmart. It performs the following execution.
data = eval(outputFilterFunction)(data);
Therefore, you can retrieve attributes using this function. The below example shows how to scrape name and ID attributes.
(object) => ({
id: object.id,
name: object.name
})
We've optimized this API to execute blazing fast and extract more possible products. Hence, it forefronts every product data request. If the source doesn't block this scraper frequently, it will scrape about 50 Walmart products in 120 seconds with 0.3 to 0.5 compute units.
{
"startUrls": [
{
"url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
},
{
"url": "https://www.walmart.com/browse/home/"
},
{
"url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
},
{
"url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
}
],
"search": "apples",
"endPage": 6,
"maxItems": 100,
"outputFilterFunction": "(object) => ({...object})"
}
While executing, this scraper will display output messages sharing what is happening. Every message contains a short label mentioning which product page it scrapes.
After loading items, you must see the event message with the total and loaded item counts for every page.
If you feed the wrong input, it will fail to execute and display the reason for failure in the output.
While executing, the API saves the output into datasets, with every item unique.
You can get outputs in any coding language like PHP, Node.js, or Python.
You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN >
in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.
import { RealdataAPIClient } from 'RealdataAPI-Client';
// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
token: '<YOUR_API_TOKEN>',
});
// Prepare actor input
const input = {
"startUrls": [
{
"url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
},
{
"url": "https://www.walmart.com/browse/home/"
},
{
"url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
},
{
"url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
}
],
"maxItems": 50,
"endPage": 1,
"extendOutputFunction": ($) => {
const result = {};
// Uncomment to add a title to the output
// result.title = $('title').text().trim();
return result;
},
"outputFilterFunction": (object) => ({...object}),
"proxy": {
"useRealdataAPIProxy": true
}
};
(async () => {
// Run the actor and wait for it to finish
const run = await client.actor("epctex/walmart-scraper").call(input);
// Fetch and print actor results from the run's dataset (if any)
console.log('Results from dataset');
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});
})();
from RealdataAPI_client import RealdataAPIClient
# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")
# Prepare the actor input
run_input = {
"startUrls": [
{ "url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920" },
{ "url": "https://www.walmart.com/browse/home/" },
{ "url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets" },
{ "url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382" },
],
"maxItems": 50,
"endPage": 1,
"extendOutputFunction": """($) => {
const result = {};
// Uncomment to add a title to the output
// result.title = $('title').text().trim();
return result;
}""",
"outputFilterFunction": "(object) => ({...object})",
"proxy": { "useRealdataAPIProxy": True },
}
# Run the actor and wait for it to finish
run = client.actor("epctex/walmart-scraper").call(run_input=run_input)
# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>
# Prepare actor input
cat > input.json <<'EOF'
{
"startUrls": [
{
"url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
},
{
"url": "https://www.walmart.com/browse/home/"
},
{
"url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
},
{
"url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
}
],
"maxItems": 50,
"endPage": 1,
"extendOutputFunction": "($) => {/n const result = {};/n // Uncomment to add a title to the output/n // result.title = $('title').text().trim();/n/n return result;/n}",
"outputFilterFunction": "(object) => ({...object})",
"proxy": {
"useRealdataAPIProxy": true
}
}
EOF
# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/epctex~walmart-scraper/runs?token=$API_TOKEN" /
-X POST /
-d @input.json /
-H 'Content-Type: application/json'
startUrls
Optional Array
Links to begin with. You should feed product detail or a category URL list
maxItems
Optional Integer
Maximum item count that you wish to extract.
endPage
Optional Integer
The page serial number you wish to finish the execution with zero end page default value.
search
Optional String
Search keywords you want to explore on the source platform.
extendOutputFunction
Optional String
This function will merge output with default results.
outputFilterFunction
Optional String
This function helps to map scraped output results according to your choices.
proxy
Required Object
Choose proxy servers to help your crawler.
{
"startUrls": [
{
"url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
},
{
"url": "https://www.walmart.com/browse/home/"
},
{
"url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
},
{
"url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
}
],
"maxItems": 50,
"endPage": 1,
"extendOutputFunction": "($) => {/n const result = {};/n // Uncomment to add a title to the output/n // result.title = $('title').text().trim();/n/n return result;/n}",
"outputFilterFunction": "(object) => ({...object})",
"proxy": {
"useRealdataAPIProxy": true
}
}