logo

YouTube Video Data Scraper - YouTube Video Data Collection

RealdataAPI / youTube-video-data-scraper

Using YouTube Video Data Scraper, scrape and store the channel name, number of subscribers, video views, likes, comments, etc. It is a Replacement for YouTube API without any quota or limit. Use it in Australia, Germany, France, Canada, Singapore, Italy, Spain, Brazil, Mexico, UAE, USA, UK, and other countries.

What is YouTube Video Data Scraper and How Does it Work?

It is a simple and easy-to-use YouTube video data scraping tool that allows you to cross the limits of the official data API of YouTube. It crawls YouTube to extract video information from the platform without any limits in the form of units of quotas. It gives unlimited data on:

  • YouTube search result list depending on preferred search queries.
  • YouTube channel details, including videos, descriptions, subscriber counts, and more.
  • It gives information for individual videos from the platform, including the amount likes on videos, release date, view counts, duration, comments, URL, description, and more.

How to Scrape Video Data from YouTube?

The YouTube Video metadata scraper lets you extract YouTube data using video URLs, search terms, channels, or search result pages as input parameters. If you fill in all the input fields, the scraper will prioritize the URL input.

Why Scrape YouTube Video Data?

  • Track the market: observe the content position in the search appearances and brand mentions, and get insights into competitor actions.
  • Discover illegal or harmful content and comments.
  • Filter your search outputs using advanced criteria.
  • Discover the latest trends and opinions by user comments and content creators.
  • Collect video subtitles to increase accessibility or offline reading of the video.
  • Compile product and service-based information from relevant videos to automate purchasing decisions.
  • Use it as a YouTube video analytics tool.

Can I Scrape YouTube Video Data Legally?

Following personal and copyright data regulatory guidelines, you can scrape YouTube Legally. Our YouTube scraper deals with privacy consent dialogs and cookies on your behalf, so remember that you may get some personal data in your Video data scraper output.

GDPR and other regulations worldwide protect private data. It would help if you only practiced scraping private data for a legit purpose.

If you need to clarify the legitimacy of your data scraping reason, please seek advice from your advocate before starting YouTube video data collection.

What is the cost of using YouTube Video Data Scraper?

We provide 5 USD monthly platform credit in our free plan. That can help you scrape around two thousand YouTube items. Visit our pricing page to extract YouTube video data at scale.

Do I Need to Use Proxy Server to Scrape YouTube Data?

Like other social media data scrapers on our platform, using a proxy server to scrape the required data smoothly using YouTube Video Data Extractor is essential. You can set up your proxy or try our default proxy. However, you can't use any data center proxy server to run this YouTube video scraping tool.

Input Parameters of YouTube Video Data Scraper

You can provide JSON input or use a user-friendly interface in your console account. YouTube Video Scraper identifies the following input fields:

  • startUrls - put the URL of the YouTube video to extract channels, videos, or search result pages.
  • searchKeywords - you can use YouTube search terms instead of a link.
  • maxResults - you can set the required video count you want to scrape from YouTube from each channel or search term.
  • maxComments - you can restrict maximum video comments from specific YouTube videos you want to extract.
  • subtitlesLanguage - you can only export the subtitles using the selected language.
  • downloadSubtitles - Scrape auto-generated or user-created video captions and transform them to .srt data format.
  • preferAutoGeneratedSubtitles - prioritize auto-generated video subtitles that convert speech to text over user-created subtitles.
  • proxyConfiguration - you can set up proxy server settings.
  • saveSubsToKVS - store the scraped video subtitles in a key-value store on our platform.
  • verboseLog - switch on the verbose logging to track scraper executions accurately for more comprehensive data.

Visit the input tab of the scraper to learn more about the input parameters of the YouTube video analysis tool in detail.

For different input types, here are a few JSON examples.

Data scraping from YouTube Videos by URL

Input a search result page, video link, or YouTube channel:

{ "downloadSubtitles": false, "preferAutoGeneratedSubtitles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveSubsToKVS": false, "simplifiedInformation": false, "startUrls": [ { "url": "https://www.youtube.com/watch?v=oxy8udgWRmo" } ], "verboseLog": false }

Scraping YouTube Video Data Using Search Queries

Feed search keywords you generally search on YouTube to watch the required video:

{ "downloadSubtitles": false, "maxResults": 10, "preferAutoGeneratedSubtitles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveSubsToKVS": false, "searchKeywords": "terminator dark fate trailer", "simplifiedInformation": false, "verboseLog": false }

Output Sample of YouTube Scraper

After completing the scraping process successfully, you can save and export the data in multiple formats, including RSS, HTML, XML, CSV, or JSON. Here is an example output in JSON format.

{ "title": "Terminator: Dark Fate - Official Trailer (2019) - Paramount Pictures", "id": "oxy8udgWRmo", "url": "https://www.youtube.com/watch?v=oxy8udgWRmo", "viewCount": 19826925, "date": "2019-08-29T00:00:00+00:00", "likes": 144263, "dislikes": null, "location": "DOUBLE DOSE CAFÉ", "channelName": "Paramount Pictures", "channelUrl": "https://www.youtube.com/c/paramountpictures", "numberOfSubscribers": 2680000, "duration": "2:34", "commentsCount": 25236, "details": "<span dir=\"auto\" class=\"style-sco..." }

Important Notes to Customize the YouTube Data Scraper

Extend Output Function

It allows you to eliminate results and introduce different output properties by changing the output shape or using the page variable together:

async ({ item }) => { // remove information from the item item.details = undefined; // or delete item.details; return item; }
async ({ item, page }) => { // add more info, in this case, the shortLink for the video const shortLink = await page.evaluate(() => { const link = document.querySelector('link[rel="shortlinkUrl"]'); if (link) { return link.href; } }); return { ...item, shortLink, } }
async ({ item }) => { // omit item, just return null return null; }

Extend Scraper Function

It allows you to add functionality to the available baseline scraper behavior. For instance, you can enqueue relevant YouTube videos without recursively adding them.

async ({ page, request, requestQueue, customData, Apify }) => { if (request.userData.label === 'DETAIL' && !request.userData.isRelated) { await page.waitForSelector('ytd-watch-next-secondary-results-renderer'); const related = await page.evaluate(() => { return [...document.querySelectorAll('ytd-watch-next-secondary-results-renderer a[href*="watch?v="]')].map(a => a.href); }); for (const url of related) { await requestQueue.addRequest({ url, userData: { label: 'DETAIL', isRelated: true, }, }); } } }

NB: The above function will repeatedly try the same video link if there is any exception.

Do you need to scrape other social media and video data?

We have dedicated and general scrapers to help you scrape video and social media data from various platforms. You can visit the store page and filter the video or social media category to use the relevant scraper.

YouTube Video Scraper with Integrations

Lastly, you can connect the YouTube video data scraper with any web application or cloud service using integrations on our platform. Integrating the scraper with Slack, Google Drive, GitHub, Zapier Airbyte, Google Sheets Make, and other platforms is possible. You can also use webhooks to conduct an action for event occurrence, like getting an alert for the successful execution of the YouTube video data crawler.

Using YouTube Video Scraper with the Real Data API Actor

Our actor gives programmatic platform access. We have organized the actor around RESTful HTTP ends to allow you to schedule, run, and manage our APIs. It also allows you to track performance, create and update scraper versions, retrieve outputs, access datasets, and more.

You can use our client NPM and client PyPl packages to access the actor using Node.js and Python, respectively. Visit the API tab of the scraper to study sample codes.

Share Your Feedback

Our team constantly works to improve scraper performances. If you find any bugs or have technical suggestions or feedback, you can create an issue by visiting the issue tab from your console account.

Industries

Check out how industries use YouTube Video Data Scraper worldwide.

saas-btn.webp

E-commerce & Retail

You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN > in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.

import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with API token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "searchKeywords": "Crawlee",
    "maxResults": 10,
    "maxResultsShorts": 10,
    "maxResultStreams": 10,
    "extendOutputFunction": async ({ data, item, page, request, customData }) => {
      return item; 
    },
    "extendScraperFunction": async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {
     
    },
    "customData": {},
    "handlePageTimeoutSecs": 3600,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyCountry": "US"
    }
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("bernardo/youtube-scraper").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from apify_client import ApifyClient

# Initialize the ApifyClient with your API token
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the actor input
run_input = {
    "searchKeywords": "Crawlee",
    "maxResults": 10,
    "maxResultsShorts": 10,
    "maxResultStreams": 10,
    "extendOutputFunction": """async ({ data, item, page, request, customData }) => {
  return item; 
}""",
    "extendScraperFunction": """async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {
 
}""",
    "customData": {},
    "handlePageTimeoutSecs": 3600,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyCountry": "US",
    },
}

# Run the actor and wait for it to finish
run = client.actor("bernardo/youtube-scraper").call(run_input=run_input)

# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare actor input
cat > input.json <<'EOF'
{
  "searchKeywords": "Crawlee",
  "maxResults": 10,
  "maxResultsShorts": 10,
  "maxResultStreams": 10,
  "extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n  return item; \n}",
  "extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
  "customData": {},
  "handlePageTimeoutSecs": 3600,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyCountry": "US"
  }
}
EOF

# Run the actor
curl "https://api.apify.com/v2/acts/bernardo~youtube-scraper/runs?token=$API_TOKEN" \
  -X POST \
  -d @input.json \
  -H 'Content-Type: application/json'

Search Term

searchKeywords Optional String

Place the search query similar to that you search in YouTube's search bar.

Maximum Search Results

maxResults Optional Integer

Limit the video count you want to extract. To get total outputs, you can keep the input field blank.

Direct URLs

startUrls Optional Array

Place the YouTube video URL, search result page, or channel. You can upload the Google Sheet or a CSV file with the URL list.

Important Note: if you use this input field, the scraper will ignore the search query input.

Only Collect Basic Channel Information

simplifiedInformation Optional Boolean

If you set it to true, the tool will only collect the available data from the channel page. And the data for separate videos will have limitations.

Save Short Videos

saveShorts Optional Boolean

If correct, the scraper will store short videos from the selected channel.

Maximum Shorts Videos

maxResultsShorts Optional Integer

Set the limit of the short video count you want to scrape from the selected YouTube channel. To scrape unlimited short videos, keep the field empty.

Save Streams

saveStreams saveStreams

If correct, the scraper will store the live-stream videos from the selected YouTube channel.

Maximum Streams

maxResultStreams Optional Integer

Set the maximum limit to the stream count of videos you want to extract from the selected channel. Keep the input field blank to get limitless outputs.

Maximum Comments

maxComments Optional Integer

Set the maximum limit to the video comments you want to extract from the selected video. Keep the input field blank or feed zero to the input field if you don't want to scrape any comments.

Download Subtitles

downloadSubtitles Optional Boolean

If you set it to true, the tool will export video subtitles and transform them to .srt data format.

Store Video Subtitles to Key-Value Store

saveSubsToKVS Optional Boolean

If set to true, the crawler will store the subtitles of the downloaded video in the key-value store.

Important Note: you should turn on download video subtitles to use this option.

Subtitle Language

subtitlesLanguage Optional Enum

It is about video subtitle language downloading.

Important Note: You should turn on the download subtitle option to use it.

Options:

  • en string
  • de string
  • it string
  • fr string
  • pt string
  • ko string
  • ja string
  • ru string
  • nl string
  • es string

Choose Autogenerated Subtitles

preferAutoGeneratedSubtitles Optional Boolean

If it is true, the scraper will choose auto-generated video subtitles. Remember that you must select subtitle language to use this option.

Extend Output Function

extendOutputFunction Optional String

Eliminate or add properties on result objects or remove the zero returning a result.

Extend Scraper Function

extendScraperFunction Optional String

It is an advanced function that permits you to expand the functionality of the default scraper. It allows you to perform page actions manually.

Custom Data

customData Optional Object

Any YouTube data you wish to add to the Extent scraper or output function

Handle Page Timeout

handlePageTimeoutSecs Optional Integer

You can set up the handlePageTimeout in seconds.

Proxy Configuration

proxyConfiguration Required Object

Use custom proxies or try relevant proxies from our platform.

Verbose Log

verboseLog Optional Boolean

It displays excessive logging data.

{
  "searchKeywords": "Crawlee",
  "maxResults": 10,
  "startUrls": [],
  "simplifiedInformation": false,
  "saveShorts": false,
  "maxResultsShorts": 10,
  "saveStreams": false,
  "maxResultStreams": 10,
  "maxComments": 0,
  "subtitlesLanguage": "en",
  "extendOutputFunction": "async ({ data, item, page, request, customData }) => {\n  return item; \n}",
  "extendScraperFunction": "async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {\n \n}",
  "customData": {},
  "handlePageTimeoutSecs": 3600,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyCountry": "US"
  }
}