Disclaimer : Real Data API only extracts publicly available data while maintaining a strict policy against collecting any personal or identity-related information.
Using YouTube Video Data Scraper, scrape and store the channel name, number of subscribers, video views, likes, comments, etc. It is a Replacement for YouTube API without any quota or limit. Use it in Australia, Germany, France, Canada, Singapore, Italy, Spain, Brazil, Mexico, UAE, USA, UK, and other countries.
It is a simple and easy-to-use YouTube video data scraping tool that allows you to cross the limits of the official data API of YouTube. It crawls YouTube to extract video information from the platform without any limits in the form of units of quotas. It gives unlimited data on:
The YouTube Video metadata scraper lets you extract YouTube data using video URLs, search terms, channels, or search result pages as input parameters. If you fill in all the input fields, the scraper will prioritize the URL input.
Following personal and copyright data regulatory guidelines, you can scrape YouTube Legally. Our YouTube scraper deals with privacy consent dialogs and cookies on your behalf, so remember that you may get some personal data in your Video data scraper output.
GDPR and other regulations worldwide protect private data. It would help if you only practiced scraping private data for a legit purpose.
If you need to clarify the legitimacy of your data scraping reason, please seek advice from your advocate before starting YouTube video data collection.
We provide 5 USD monthly platform credit in our free plan. That can help you scrape around two thousand YouTube items. Visit our pricing page to extract YouTube video data at scale.
Like other social media data scrapers on our platform, using a proxy server to scrape the required data smoothly using YouTube Video Data Extractor is essential. You can set up your proxy or try our default proxy. However, you can't use any data center proxy server to run this YouTube video scraping tool.
You can provide JSON input or use a user-friendly interface in your console account. YouTube Video Scraper identifies the following input fields:
Visit the input tab of the scraper to learn more about the input parameters of the YouTube video analysis tool in detail.
For different input types, here are a few JSON examples.
Input a search result page, video link, or YouTube channel:
{ "downloadSubtitles": false, "preferAutoGeneratedSubtitles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveSubsToKVS": false, "simplifiedInformation": false, "startUrls": [ { "url": "https://www.youtube.com/watch?v=oxy8udgWRmo" } ], "verboseLog": false }
Feed search keywords you generally search on YouTube to watch the required video:
{ "downloadSubtitles": false, "maxResults": 10, "preferAutoGeneratedSubtitles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveSubsToKVS": false, "searchKeywords": "terminator dark fate trailer", "simplifiedInformation": false, "verboseLog": false }
After completing the scraping process successfully, you can save and export the data in multiple formats, including RSS, HTML, XML, CSV, or JSON. Here is an example output in JSON format.
{ "title": "Terminator: Dark Fate - Official Trailer (2019) - Paramount Pictures", "id": "oxy8udgWRmo", "url": "https://www.youtube.com/watch?v=oxy8udgWRmo", "viewCount": 19826925, "date": "2019-08-29T00:00:00+00:00", "likes": 144263, "dislikes": null, "location": "DOUBLE DOSE CAFÉ", "channelName": "Paramount Pictures", "channelUrl": "https://www.youtube.com/c/paramountpictures", "numberOfSubscribers": 2680000, "duration": "2:34", "commentsCount": 25236, "details": "<span dir=\"auto\" class=\"style-sco..." }
Extend Output Function
It allows you to eliminate results and introduce different output properties by changing the output shape or using the page variable together:
async ({ item }) => { // remove information from the item item.details = undefined; // or delete item.details; return item; }
async ({ item, page }) => { // add more info, in this case, the shortLink for the video const shortLink = await page.evaluate(() => { const link = document.querySelector('link[rel="shortlinkUrl"]'); if (link) { return link.href; } }); return { ...item, shortLink, } }
async ({ item }) => { // omit item, just return null return null; }
Extend Scraper Function
It allows you to add functionality to the available baseline scraper behavior. For instance, you can enqueue relevant YouTube videos without recursively adding them.
async ({ page, request, requestQueue, customData, Apify }) => { if (request.userData.label === 'DETAIL' && !request.userData.isRelated) { await page.waitForSelector('ytd-watch-next-secondary-results-renderer'); const related = await page.evaluate(() => { return [...document.querySelectorAll('ytd-watch-next-secondary-results-renderer a[href*="watch?v="]')].map(a => a.href); }); for (const url of related) { await requestQueue.addRequest({ url, userData: { label: 'DETAIL', isRelated: true, }, }); } } }
NB: The above function will repeatedly try the same video link if there is any exception.
We have dedicated and general scrapers to help you scrape video and social media data from various platforms. You can visit the store page and filter the video or social media category to use the relevant scraper.
Lastly, you can connect the YouTube video data scraper with any web application or cloud service using integrations on our platform. Integrating the scraper with Slack, Google Drive, GitHub, Zapier Airbyte, Google Sheets Make, and other platforms is possible. You can also use webhooks to conduct an action for event occurrence, like getting an alert for the successful execution of the YouTube video data crawler.
Our actor gives programmatic platform access. We have organized the actor around RESTful HTTP ends to allow you to schedule, run, and manage our APIs. It also allows you to track performance, create and update scraper versions, retrieve outputs, access datasets, and more.
You can use our client NPM and client PyPl packages to access the actor using Node.js and Python, respectively. Visit the API tab of the scraper to study sample codes.
Our team constantly works to improve scraper performances. If you find any bugs or have technical suggestions or feedback, you can create an issue by visiting the issue tab from your console account.
You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN >
in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.
import { ApifyClient } from 'apify-client';
// Initialize the ApifyClient with API token
const client = new ApifyClient({
token: '<YOUR_API_TOKEN>',
});
// Prepare actor input
const input = {
"searchKeywords": "Crawlee",
"maxResults": 10,
"maxResultsShorts": 10,
"maxResultStreams": 10,
"extendOutputFunction": async ({ data, item, page, request, customData }) => {
return item;
},
"extendScraperFunction": async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {
},
"customData": {},
"handlePageTimeoutSecs": 3600,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyCountry": "US"
}
};
(async () => {
// Run the actor and wait for it to finish
const run = await client.actor("bernardo/youtube-scraper").call(input);
// Fetch and print actor results from the run's dataset (if any)
console.log('Results from dataset');
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});
})();
from apify_client import ApifyClient
# Initialize the ApifyClient with your API token
client = ApifyClient("<YOUR_API_TOKEN>")
# Prepare the actor input
run_input = {
"searchKeywords": "Crawlee",
"maxResults": 10,
"maxResultsShorts": 10,
"maxResultStreams": 10,
"extendOutputFunction": """async ({ data, item, page, request, customData }) => {
return item;
}""",
"extendScraperFunction": """async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {
}""",
"customData": {},
"handlePageTimeoutSecs": 3600,
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyCountry": "US",
},
}
# Run the actor and wait for it to finish
run = client.actor("bernardo/youtube-scraper").call(run_input=run_input)
# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>
# Prepare actor input
cat > input.json <<'EOF'
{
"searchKeywords": "Crawlee",
"maxResults": 10,
"maxResultsShorts": 10,
"maxResultStreams": 10,
"extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n return item; \n}",
"extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
"customData": {},
"handlePageTimeoutSecs": 3600,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyCountry": "US"
}
}
EOF
# Run the actor
curl "https://api.apify.com/v2/acts/bernardo~youtube-scraper/runs?token=$API_TOKEN" \
-X POST \
-d @input.json \
-H 'Content-Type: application/json'
searchKeywords
Optional String
Place the search query similar to that you search in YouTube's search bar.
maxResults
Optional Integer
Limit the video count you want to extract. To get total outputs, you can keep the input field blank.
startUrls
Optional Array
Place the YouTube video URL, search result page, or channel. You can upload the Google Sheet or a CSV file with the URL list.
Important Note: if you use this input field, the scraper will ignore the search query input.
simplifiedInformation
Optional Boolean
If you set it to true, the tool will only collect the available data from the channel page. And the data for separate videos will have limitations.
saveShorts
Optional Boolean
If correct, the scraper will store short videos from the selected channel.
maxResultsShorts
Optional Integer
Set the limit of the short video count you want to scrape from the selected YouTube channel. To scrape unlimited short videos, keep the field empty.
saveStreams
saveStreams
If correct, the scraper will store the live-stream videos from the selected YouTube channel.
maxResultStreams
Optional Integer
Set the maximum limit to the stream count of videos you want to extract from the selected channel. Keep the input field blank to get limitless outputs.
maxComments
Optional Integer
Set the maximum limit to the video comments you want to extract from the selected video. Keep the input field blank or feed zero to the input field if you don't want to scrape any comments.
downloadSubtitles
Optional Boolean
If you set it to true, the tool will export video subtitles and transform them to .srt data format.
saveSubsToKVS
Optional Boolean
If set to true, the crawler will store the subtitles of the downloaded video in the key-value store.
Important Note: you should turn on download video subtitles to use this option.
subtitlesLanguage
Optional Enum
It is about video subtitle language downloading.
Important Note: You should turn on the download subtitle option to use it.
Options:
preferAutoGeneratedSubtitles
Optional Boolean
If it is true, the scraper will choose auto-generated video subtitles. Remember that you must select subtitle language to use this option.
extendOutputFunction
Optional String
Eliminate or add properties on result objects or remove the zero returning a result.
extendScraperFunction
Optional String
It is an advanced function that permits you to expand the functionality of the default scraper. It allows you to perform page actions manually.
customData
Optional Object
Any YouTube data you wish to add to the Extent scraper or output function
handlePageTimeoutSecs
Optional Integer
You can set up the handlePageTimeout in seconds.
proxyConfiguration
Required Object
Use custom proxies or try relevant proxies from our platform.
verboseLog
Optional Boolean
It displays excessive logging data.
{
"searchKeywords": "Crawlee",
"maxResults": 10,
"startUrls": [],
"simplifiedInformation": false,
"saveShorts": false,
"maxResultsShorts": 10,
"saveStreams": false,
"maxResultStreams": 10,
"maxComments": 0,
"subtitlesLanguage": "en",
"extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n return item; \n}",
"extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
"customData": {},
"handlePageTimeoutSecs": 3600,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyCountry": "US"
}
}