Disclaimer : Real Data API only extracts publicly available data while maintaining a strict policy against collecting any personal or identity-related information.
Web scraping Reddit data, such as subreddits, categories, comments, likes, and user profiles using the Reddit Data Scraper, will enhance all your data scraping activities on Reddit. Business projects, reports, and market analysis can be done using the data scraped. Reddit Data Scraper and Reddit Scraper API are accessible from places like Canada, France, Australia, Germany, USA, and UK, among others.
An unofficial Reddit API collects unlimited data from Reddit without authentication. It allows you to extract comments and posts together along with some information about the user without login. We have developed it with Real Data API SDK; you can use it locally or on our platform.
Reddit Scraper permits you to:
Try our dedicated free Reddit Scraper if you want to extract Reddit data quickly on a smaller scale. Only enter keywords or Reddit URLs and tap on the scrape option. Remember that free Reddit Scraper can scrape up to 10 comments, 10 posts, 2 leaderboard items, and 2 subreddits for you.
Reddit Scraper on the Real Data API platform gives you one thousand results for 4 USD platform credits. You can cover it with our free 5 USD platform credit in our monthly free plan.
However, if you want to scrape more Reddit data, you must try our monthly personal plan of 49 USD to get over ten thousand results in a month.
You don't need any coding knowledge or skill to use Reddit Data Scraper API. If you don't know where to begin, follow the below stepwise video tutorial. You can also use this tutorial for Free Reddit Scraper.
There are two methods to scrape Reddit if you run Reddit Scraper on the Real Data API platform.
Almost any link from Reddit will return a dataset. The scraper will display a message if the URL is not supported before scraping the page.
These are a few input examples of Reddit URLs that you can scrape.
Note: The scraper will only scrape posts if you use the search link as a startURL parameter. Use the specific URL or search field for Reddit user search and community search.
Search Types: It denotes which part of Reddit you are scraping - users, communities, or posts.
Search Term: It is the keyword you want to search on the Reddit search engine. You can add multiple fields or keep only one. If you are using startUrls, don't try this.
Sort Search: It will sort Search outputs by Top, Hot, most comment counts, or Relevance.
Filter by Time or Date: It will categorize the search by the last month, day, week, hour, or year. You can use it only while scraping posts.
To check the entire parameter list, how to set default values, and actual default values, go to the Input Schema tab.
It is an input example of how the scraper will display the input field if you scrape all Reddit users and communities with the keyword parrot. You will see the sorted output by the latest first.
{
"maxItems": 10,
"maxPostCount": 10,
"maxComments": 10,
"maxCommunitiesAndUsers": 10,
"maxLeaderBoardItems": 10,
"scrollTimeout": 40,
"proxy": {
"useRealdataAPIProxy": true
},
"debugMode": false,
"searches": ["parrots"],
"type": "communities_and_users",
"sort": "new",
"time": "all"
}
The scraper will store the output in the dataset. The dataset contains each comment, community, list, or user. Once the Reddit API finishes the run, you can export the scraper Reddit data on your device or export it to any web application in multiple usable formats. Check out the below output examples for various input examples.
{
"id": "ss5c25",
"title": "Weekly Questions Thread / Open Discussion",
"description": "For any questions regarding dough, sauce, baking methods, tools, and more, comment below.You can also post any art, tattoos, comics, etc here. Keep it SFW, though.As always, our wiki has a few sauce recipes and recipes for dough.Feel free to check out threads from weeks ago.This post comes out every Monday and is sorted by 'new'.",
"numberOfVotes": "4",
"createdAt": "3 days ago",
"scrapedAt": "2022-01-09T22:52:48.489Z",
"username": "u/AutoModerator",
"numberOfComments": "19",
"mediaElements": [],
"tag": "HELP",
"dataType": "post"
}
{
"url": "https://www.reddit.com/r/Pizza/comments/sud2hm/tomato_pie_from_sallys_apizza_stamford_ct/t1_hx9k9it",
"username": "Acct-404",
"createdAt": "9 h ago",
"scrapedAt": "2022-03-09T12:52:48.547Z",
"description": "Raises handUhhhh can I get some cheese on my pizza please?",
"numberOfVotes": "3",
"postUrl": "https://www.reddit.com/r/Pizza/comments/sud2hm/tomato_pie_from_sallys_apizza_stamford_ct/",
"postId": "sud2hm",
"dataType": "comment"
}
{
"title": "Pizza",
"alternativeTitle": "r/Pizza",
"createdAt": "Created Aug 26, 2008",
"scrapedAt": "2022-03-09T12:54:42.721Z",
"members": 366000,
"moderatos": [
"6745408",
"AutoModerator",
"BotTerminator",
"DuplicateDestroyer"
],
"url": "https://www.reddit.com/r/pizza/",
"dataType": "community",
"categories": ["hot", "new", "top", "rising"]
}
You can set up the maximum ist count you want to scrape the user or inside the community if you need to restrict the search scope. Further, using the parameters below, you can restrict the comment count for every post and community count with the leaderboard numbers.
{
"maxPostCount": 50,
"maxComments": 10,
"maxCommunitiesAndUsers": 5,
"maxLeaderBoardsItems": 5
}
If you want to prevent a long actor run, you can set max items. Once it reaches the result count you have asked to scrape, it will stop the scraper. Hence you should take care not to trim your outputs.
Visit the Input schema tab to check the entire list of methods to limit Reddit Scraper with maxLeaderBoardItems, maxComments, maxItems, maxCommunitiesAndUsers, and maxPostCount.
You can use this scraper function to update the output results of this scraper. You can select the data type you want to scrape from Reddit. The resulting output will merge with the output from this function.
To achieve three different things, you can return the below fields.
async () => {
return {
pageTitle: document.querySelector("title").innerText,
};
};
The below example will add the page title to the final object.
{
"title": "Pizza",
"alternativeTitle": "r/Pizza",
"createdAt": "Created Aug 26, 2008",
"scrapedAt": "2022-03-08T21:57:25.832Z",
"members": 366000,
"moderators": [
"6745408",
"AutoModerator",
"BotTerminator",
"DuplicateDestroyer"
],
"url": "https://www.reddit.com/r/pizza/",
"categories": ["hot", "new", "top", "rising"],
"dataType": "community",
"pageTitle": "homemade chicken cheese masala pasta"
}
You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN >
in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.
import { RealdataAPIClient } from 'RealdataAPI-Client';
// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
token: '<YOUR_API_TOKEN>',
});
// Prepare actor input
const input = {
"startUrls": [
{
"url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/"
}
],
"maxItems": 10,
"maxPostCount": 10,
"maxComments": 10,
"maxCommunitiesAndUsers": 2,
"maxLeaderBoardItems": 2,
"scrollTimeout": 40,
"proxy": {
"useRealdataAPIProxy": true
}
};
(async () => {
// Run the actor and wait for it to finish
const run = await client.actor("trudax/reddit-scraper").call(input);
// Fetch and print actor results from the run's dataset (if any)
console.log('Results from dataset');
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});
})();
from RealdataAPI_client import RealdataAPIClient
# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")
# Prepare the actor input
run_input = {
"startUrls": [{ "url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/" }],
"maxItems": 10,
"maxPostCount": 10,
"maxComments": 10,
"maxCommunitiesAndUsers": 2,
"maxLeaderBoardItems": 2,
"scrollTimeout": 40,
"proxy": { "useRealdataAPIProxy": True },
}
# Run the actor and wait for it to finish
run = client.actor("trudax/reddit-scraper").call(run_input=run_input)
# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>
# Prepare actor input
cat > input.json <<'EOF'
{
"startUrls": [
{
"url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/"
}
],
"maxItems": 10,
"maxPostCount": 10,
"maxComments": 10,
"maxCommunitiesAndUsers": 2,
"maxLeaderBoardItems": 2,
"scrollTimeout": 40,
"proxy": {
"useRealdataAPIProxy": true
}
}
EOF
# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/trudax~reddit-scraper/runs?token=$API_TOKEN" /
-X POST /
-d @input.json /
-H 'Content-Type: application/json'
startUrls
Optional Array
If you have page URLs already that you want to scrape, you can use them here. To use the below search field, remove each starting URL.
searches
Optional Array
You can share a search term here to search on the Reddit search engine.
searchPosts
Optional Boolean
Using the provided search, it will search to see posts.
searchComments
Optional Boolean
Reddit will search comments using the given search.
searchCommunities
Optional Boolean
It will explore communities using the submitted search.
searchUsers
Optional Boolean
It will search Reddit users using the available search.
sort
Optional String
Sort the search by comments, top, relevance, hot, or new.
"hot"
,"relevance"
,"new"
,"comment"
,"top"
, etc.
time
Optional String
Categorize posts by last year, month, day, week, or hour.
"hour"
,"year"
,"all"
,"week"
,"day"
,"month"
.
maxItems
Optional Integer
It will save the maximum item count in the datasets. If you are scraping users and communities, remember that the Scraper will save every category inside the dataset as a different item.
maxPostCount
Optional Integer
The maximum post count that the Scraper will store for every post, community, page, or user link.
maxComments
Optional Integer
The maximum comments count that the scraper will scrape from every comment page. You can set it to 0 if you are not planning to extract comments.
maxCommunitiesAndUsers
Optional Integer
It will scrape the maximum community and user page count if your start URL or search is a user or community type.
maxLeaderBoardItems
Optional Integer
It will scrape the limit of leaderboard page communities.
extendOutputFunction
Optional String
You can write custom JavaScript code to scrape the custom data from the Reddit page.
scrollTimeout
Optional Integer
Set the second-based timeout to stop the page from scrolling down and exploring new items.
proxy
Required Object
Choose a Real Data API proxy server or use your proxy to support the Scraper.
debugMode
Optional Boolean
See detailed logs by activating debug mode.
{
"startUrls": [
{
"url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/"
}
],
"searchPosts": true,
"searchComments": false,
"searchCommunities": false,
"searchUsers": false,
"maxItems": 10,
"maxPostCount": 10,
"maxComments": 10,
"maxCommunitiesAndUsers": 2,
"maxLeaderBoardItems": 2,
"scrollTimeout": 40,
"proxy": {
"useRealdataAPIProxy": true
},
"debugMode": false
}