logo

Twitter Scraper - Twitter Profile Extractor

RealdataAPI / twitter-data-scraper

Scrape Twiter Data about users, including user profiles, follower count, followings, hashtags, tweets, retweets, threads, images, statistics, videos, history, replies, and other data fields using Twitter Data Scraper. Our scraper to extract Twitter data is accessible in multiple countries, including Canada, France, Australia, Germany, the USA, the UK, Spain, etc.

Which Twitter Data Can This Twitter Scraper Extract?

Twitter Scraper loads mentioned Twitter URLs and profiles to scrape the below data.

  • User data like Twitter username, follower count, following count, location, images, username, banner, etc.
  • Retweets, list of tweets, profile replies.
  • Latest, video tweets, people, search hashtags, pictures, or get top.
  • Insights for every tweet, including replies, favorites, retweets, etc.

Twitter Scraper on our platform allows you to scrape Twitter data at scale. It also allows scraping data more than the official Twitter API because you don't need a registered application, Twitter account, or API key, and it has no restrictions.

You can load the source platform for the Twitter handles list or use Twitter links like trending topics, searches, or hashtags.

Why Use Real Data API Twitter Scraper?

Crawling the Twitter platform will give you access to over five hundred million tweets daily. You can collect any required data in multiple ways.

  • Monitor discussions about your city, country, products, or brand.
  • Observe attitudes, new trends, and fashions as they enter the market.
  • Track your competitors to check their popularity and how to beat them.
  • Monitor market and investor sentiments to ensure the safety of your Investments.
  • Use Twitter information to train your artificial intelligence and machine learning prototypes for academic research.
  • Study customer habits, target underdeveloped fields, or develop new products depending on customer pain points.
  • Spot fake news by learning patterns of how people spread fake information.
  • Discover discussions about services and travel destinations, and use local knowledge best.

How to Use Twitter Scraper?

To learn more about using this Twitter Scraper, check out our stepwise tutorial or watch the video.

Can I Scrape Twitter Data Legally?

Yes, you can extract publicly available data from Twitter. But you must note that you may get private data in your output. GDPR and other regulations worldwide protect personal data, respectively. They don't allow you to extract personal information without genuine reason or prior permission. You can consult your lawyers if you are confused or unsure whether your reason is genuine.

Do You Want More Options to Scrape Twitter Data?

If you wish to extract specific Twitter data quickly, try the targeted Twitter data scraper options below.

Tips & Tricks

item-6

The scraper has the default option to extract using search queries, but you can also try Twitter URLs or Twitter handles. If you plan to use the URL option, check the below allowable URL types.

Logging In Using Cookies

The option to log in using cookies allows you to use the already initialized cookies of the existing user. If you try this option, the scraper will try to avoid the block from the source platform. For example, the scraper will reduce the running speed and introduce a random delay between two actions.

We highly recommend you don't use a private account to run the scraper unless there is no other option. Instead, you can create a new Twitter account so that Twitter won't ban your personal account.

Use Chrome browser extensions like EditThisCookie to log in using existing cookies. Once you install it, open the source platform in your browser, login into Twitter using credentials, and export cookies using a browser extension. It will give you a cookie array to use as an input value login cookie while logging in.

If you try to log out from the Twitter account with the submitted cookies, the scraper will invalidate them, and the scraper will stop its execution.

Check out the below video tutorial to sort it out.

Input Parameters

Here are the input parameters for Twitter Scraper API.

Input-Parameters

Twitter Data Output

You can export the scraped dataset in multiple digestible formats like CSV, JSON, Excel, or HTML. Every item in the scraped data set contains a different tweet in the following format.


  

[{

"user": {

"protected": false,

"created_at": "2009-06-02T20:12:29.000Z",

"default_profile_image": false,

"description": "",

"fast_followers_count": 0,

"favourites_count": 19158,

"followers_count": 130769125,

"friends_count": 183,

"has_custom_timelines": true,

"is_translator": false,

"listed_count": 117751,

"location": "",

"media_count": 1435,

"name": "Elon Musk",

"normal_followers_count": 130769125,

"possibly_sensitive": false,

"profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1576183471",

"profile_image_url_https": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",

"screen_name": "elonmusk",

"statuses_count": 23422,

"translator_type": "none",

"verified": true,

"withheld_in_countries": [],

"id_str": "44196397"

},

"id": "1633026246937546752",

"conversation_id": "1632363525405392896",

"full_text": "@MarkChangizi Sweden’s steadfastness was incredible!",

"reply_count": 243,

"retweet_count": 170,

"favorite_count": 1828,

"hashtags": [],

"symbols": [],

"user_mentions": [

{

"id_str": "49445813",

"name": "Mark Changizi",

"screen_name": "MarkChangizi"

}

],

"urls": [],

"media": [],

"url": "https://twitter.com/elonmusk/status/1633026246937546752",

"created_at": "2023-03-07T08:46:12.000Z",

"is_quote_tweet": false,

"replying_to_tweet": "https://twitter.com/MarkChangizi/status/1632363525405392896",

"startUrl": "https://twitter.com/elonmusk/with_replies"

},

{

"user": {

"protected": false,

"created_at": "2009-06-02T20:12:29.000Z",

"default_profile_image": false,

"description": "",

"fast_followers_count": 0,

"favourites_count": 19158,

"followers_count": 130769125,

"friends_count": 183,

"has_custom_timelines": true,

"is_translator": false,

"listed_count": 117751,

"location": "",

"media_count": 1435,

"name": "Elon Musk",

"normal_followers_count": 130769125,

"possibly_sensitive": false,

"profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1576183471",

"profile_image_url_https": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",

"screen_name": "elonmusk",

"statuses_count": 23422,

"translator_type": "none",

"verified": true,

"withheld_in_countries": [],

"id_str": "44196397"

},

"id": "1633021151197954048",

"conversation_id": "1632930485281120256",

"full_text": "@greg_price11 @Liz_Cheney @AdamKinzinger @RepAdamSchiff Besides misleading the public, they withheld evidence for partisan political reasons that sent people to prison for far more serious crimes than they committed./n/nThat is deeply wrong, legally and morally.",

"reply_count": 727,

"retweet_count": 2458,

"favorite_count": 10780,

"hashtags": [],

"symbols": [],

"user_mentions": [

{

"id_str": "896466491587080194",

"name": "Greg Price",

"screen_name": "greg_price11"

},

{

"id_str": "98471035",

"name": "Liz Cheney",

"screen_name": "Liz_Cheney"

},

{

"id_str": "18004222",

"name": "Adam Kinzinger #fella",

"screen_name": "AdamKinzinger"

},

{

"id_str": "29501253",

"name": "Adam Schiff",

"screen_name": "RepAdamSchiff"

}

],

"urls": [],

"media": [],

"url": "https://twitter.com/elonmusk/status/1633021151197954048",

"created_at": "2023-03-07T08:25:57.000Z",

"is_quote_tweet": false,

"replying_to_tweet": "https://twitter.com/greg_price11/status/1632930485281120256",

"startUrl": "https://twitter.com/elonmusk/with_replies"

}]

...

Search Using Advanced Feature

Use this type of pre-designed search with Advanced Search as a starting link, like https://twitter.com/search?q=cool%20until%3A2021-01-01&src=typed_query

Workaround to Get Maximum Tweets Limit

Twitter returns only 3200 tweet posts per search or profile by default. If you require more tweets than the maximum limit, you can split your starting links using time slices as the below URL samples.

  • https://twitter.com/search?q=(from%3Aelonmusk)%20since%3A2020-03-01%20until%3A2020-04-01&src=typed_query&f=live
  • https://twitter.com/search?q=(from%3Aelonmusk)%20since%3A2020-02-01%20until%3A2020-03-01&src=typed_query&f=live
  • https://twitter.com/search?q=(from%3Aelonmusk)%20since%3A2020-01-01%20until%3A2020-02-01&src=typed_query&f=live

Each link is from the same account - Elon Musk, but we separated them by a 30-day monthly timeframe, like January > February > March 2020. You can create it using the advanced search option on Twitter. https://twitter.com/search If you want, you can use larger time intervals for a few accounts that don't post regularly.

Other restrictions contain-

  • You can cap live tweets by max one day in the past.
  • Flying can cap most search results at about hundred and fifty tweets like Top, Pictures, and Videos.

Extend Output Function

This output parameter function allows you to change your dataset output shape, split data arrays into different items, or categorize the output.


  

async ({ data, item, request }) => {

  

item.user = undefined; // removes this field from the output

  

delete item.user; // this works as well

  

  

const raw = data.tweets[item['#sort_index']]; // allows you to access the raw data

  

  

item.source = raw.source; // adds "Twitter for ..." to the output

  

  

if (request.userData.search) {

  

item.search = request.userData.search; // add the search term to the output

  

item.searchUrl = request.loadedUrl; // add the raw search URL to the output

  

}

  

  

return item;

  

}

  

Item filtering:


  

async ({ item }) => {

  

if (!item.full_text.includes('lovely')) {

  

return  null; // omit the output if the tweet body doesn't contain the text

  

}

  

  

return item;

  

}

  

Separating into multiple data items and changing the entire result:


  

async ({ item }) => {

  

// dataset will be full of items like { hashtag: '#somehashtag' }

  

// returning an array here will split in multiple dataset items

  

return item.hashtags.map((hashtag) => {

  

return { hashtag: `#${hashtag}` };

  

});

  

}

  

Extend Scraper Function

This factor permits you to extend scraper working and can simplify extending the default scraper function without owning a custom version. For instance, you can contain a trending topic search for every page visit.


  

async ({ page, request, addSearch, addProfile, addThread, customData }) => {

  

await page.waitForSelector('[aria-label="Timeline: Trending now"] [data-testid="trend"]');

  

  

const trending = await page.evaluate(() => {

  

const trendingEls = $('[aria-label="Timeline: Trending now"] [data-testid="trend"]');

  

  

return trendingEls.map((_, el) => {

  

return {

  

term: $(el).find('> div > div:nth-child(2)').text().trim(),

  

profiles: $(el).find('> div > div:nth-child(3) [role="link"]').map((_, el) => $(el).text()).get()

  

}

  

}).get();

  

});

  

  

for (const { search, profiles } of trending) {

  

await addSearch(search); // add a search using text

  

  

for (const profile of profiles) {

  

await addProfile(profile); // adds a profile using link

  

}

  

}

  

  

// adds a thread and get replies. can accept an id, like from conversation_id or an URL

  

// you can call this multiple times but will be added only once

  

await addThread("1351044768030142464");

  

}

  

extendScraperFunction contains additional data variables.


  

async ({ label, response, url }) => {

  

if (label === 'response' && response) {

  

// inside the page.on('response') callback

  

if (url.includes('live_pipeline')) {

  

// deal with plain text content

  

const blob = await (await response.blob()).text();

  

}

  

} else  if (label === 'before') {

  

// executes before the page.on('response'), can be used for intercept request/response

  

} else  if (label === 'after') {

  

// executes after the scraping process has finished, even on crash

  

}

  

}

  

Twitter Scraper with Real Data API Integrations

Lastly, using Real Data API Integrations, you can connect Twitter Scraper with almost any web application or cloud service. You can connect with Google Drive, Google Sheets, Airbyte, Make, Slack, GitHub, Zapier, etc. Further, you can use Webhooks to carry out an activity once an event occurs, like an alert when Twitter Scraper completes its execution.

Using Twitter Scraper with Real Data API Platform

The Real Data API platform gives you programmatic permission to use scrapers. We have organized the Twitter Scraper API around RESTful HTTP endpoints to allow you to schedule, manage, and run Real Data API Scrapers. The actor also lets you track actor performance, create and update versions, access datasets, retrieve results, and more.

To use the scraper using Python, try our client PyPl package, and to use it using Node.js, try our client NPM package.

Check out the API tab for code examples or explore Real Data API reference documents for details.

Industries

Check out how industries use Twitter Scraper worldwide.

Marketing-and-Media

Marketing and Media

You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN > in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.

import { RealdataAPIClient } from 'RealdataAPI-Client';

// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "searchTerms": [
        "RealdataAPI"
    ],
    "searchMode": "live",
    "profilesDesired": 10,
    "tweetsDesired": 100,
    "mode": "replies",
    "proxyConfig": {
        "useRealdataAPIProxy": true
    },
    "extendOutputFunction": async ({ data, item, page, request, customData, RealdataAPI }) => {
      return item;
    },
    "extendScraperFunction": async ({ page, request, addSearch, addProfile, _, addThread, addEvent, customData, RealdataAPI, signal, label }) => {
     
    },
    "customData": {},
    "handlePageTimeoutSecs": 500,
    "maxRequestRetries": 6,
    "maxIdleTimeoutSecs": 60
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("quacker/twitter-scraper").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from RealdataAPI_client import RealdataAPIClient

# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")

# Prepare the actor input
run_input = {
    "searchTerms": ["RealdataAPI"],
    "searchMode": "live",
    "profilesDesired": 10,
    "tweetsDesired": 100,
    "mode": "replies",
    "proxyConfig": { "useRealdataAPIProxy": True },
    "extendOutputFunction": """async ({ data, item, page, request, customData, RealdataAPI }) => {
  return item;
}""",
    "extendScraperFunction": """async ({ page, request, addSearch, addProfile, _, addThread, addEvent, customData, RealdataAPI, signal, label }) => {
 
}""",
    "customData": {},
    "handlePageTimeoutSecs": 500,
    "maxRequestRetries": 6,
    "maxIdleTimeoutSecs": 60,
}

# Run the actor and wait for it to finish
run = client.actor("quacker/twitter-scraper").call(run_input=run_input)

# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare actor input
cat > input.json <<'EOF'
{
  "searchTerms": [
    "RealdataAPI"
  ],
  "searchMode": "live",
  "profilesDesired": 10,
  "tweetsDesired": 100,
  "mode": "replies",
  "proxyConfig": {
    "useRealdataAPIProxy": true
  },
  "extendOutputFunction": "async ({ data, item, page, request, customData, RealdataAPI }) => {/n  return item;/n}",
  "extendScraperFunction": "async ({ page, request, addSearch, addProfile, _, addThread, addEvent, customData, RealdataAPI, signal, label }) => {/n /n}",
  "customData": {},
  "handlePageTimeoutSecs": 500,
  "maxRequestRetries": 6,
  "maxIdleTimeoutSecs": 60
}
EOF

# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/quacker~twitter-scraper/runs?token=$API_TOKEN" /
  -X POST /
  -d @input.json /
  -H 'Content-Type: application/json'

Which Search Query Do You Wish to Scrape?

searchTerms Optional Array

The scraped will discover and scrape tweets for specific search terms that you add before starting the execution. If you wish to search hashtags, begin the search term with #; for example, if you want to search data analytics, search for #dataanalytics. Otherwise, scroll down to scrape by URL or Twitter user profile.

Are You Looking to Filter Tweets by Content?

searchMode Optional String

The setting will transform the method of sorting Twitter data before the scraper uses it to extract. For example, people, top tweets, videos, photos, or latest tweets.

Options:

"top","image","user","video"

Limit People Profiles

profilesDesired Optional Integer

Restrict the profile count to scrape. It helps if you want to extract several tweets from selected profiles.

Choose the Maximum Tweets Count for Every Search Query

tweetsDesired Optional Integer

This value allows you to set the maximum tweet count to scrape for every search term.

Get Tweet View Count

addTweetViewCount Optional Boolean

It allows you to retrieve tweet views count.

Add User Data

addUserInfo Optional Boolean

Stretches the user data according to Tweets. You can. Reduce the dataset size by turning this feature off.

Use Cheerio

useCheerio Optional Boolean

Instead of Puppeteer, you can use Cheerio to scrape tweets. Cheerio is capable of scraping all tweet posts quickly.

Are You Looking to Scrape Specific Profiles from Twitter?

handle Optional Array

You feed specific Twitter profile handles you want to scrape. Use this shortcut so that you don't need to add the complete URL of the username, like https://twitter.com/username.

Would you Like to Scrape Twitter Replies?

mode Optional String

You can only scrape tweets or also scrape replies with tweets. Remember that it only applies while Twitter profile scraping.

Are You Planning to Scrape Using Twitter URL?

startUrls Optional Array

It helps you input the scraper with the start location. You can one by one Twitter links and enter a single file link with multiple links.

Tweets Newer Than

toDate Optional String

Extract the latest tweets after a specific date in a format like YYYY-MM-DD. Use this in tweets older than with to make a time bounded slice.

Tweets Older Than

fromDate Optional String

Extract older tweets before a specific date in a format like YYYY-MM-DD. Use this in tweets newer than with to make a time bounded slice.

Use Advanced Search

useAdvancedSearch

Optional Boolean

Instead of using default search, try advanced search. It helps to extract tweets using user handles, search terms, or data range. Remember that this option doesn't scrape retweets.

Proxy Configuration

proxyConfig Required Object

If you use Real Data API proxy, configure proxy servers to help your scraper.

Extend Output Function

extendOutputFunction Optional String

Remove or add properties on the result object, or remove the result returning zero value.

Extend Scraper Function

extendScraperFunction Optional String

It is an advanced function that allows you to stretch default API functionality, enabling you to perform page actions manually.

Custom Data

customData Optional Object

Any information you wish to get inside the extend scraper or output function.

Maximum Timeout Seconds for Browser Scraping

handlePageTimeoutSecs Optional Integer

You can increase lengthy processes with maximum Timeout handlePageFunction.

Maximum Request Retries

maxRequestRetries Optional Integer

Fix the maximum request attempts.

Scrolling Idle Seconds

maxIdleTimeoutSecs Optional Integer

Setup time duration for which you didn't receive any data.

Debug log

debugLog Optional Boolean

Allow debugging log.

Login Cookies

initialCookies Optional Array

The scraper will use login cookies to bypass the login wall. To know more, check the readme tab.

Improve Tweet Scraping with Browser Fallback

browserFallback Optional Boolean
{
  "searchTerms": [
    "RealdataAPI"
  ],
  "searchMode": "live",
  "profilesDesired": 10,
  "tweetsDesired": 100,
  "addTweetViewCount": true,
  "addUserInfo": true,
  "useCheerio": true,
  "mode": "replies",
  "startUrls": [],
  "useAdvancedSearch": false,
  "proxyConfig": {
    "useRealdataAPIProxy": true
  },
  "extendOutputFunction": "async ({ data, item, page, request, customData, RealdataAPI }) => {/n  return item;/n}",
  "extendScraperFunction": "async ({ page, request, addSearch, addProfile, _, addThread, addEvent, customData, RealdataAPI, signal, label }) => {/n /n}",
  "customData": {},
  "handlePageTimeoutSecs": 500,
  "maxRequestRetries": 6,
  "maxIdleTimeoutSecs": 60,
  "debugLog": false
}