Disclaimer : Real Data API only extracts publicly available data while maintaining a strict policy against collecting any personal or identity-related information.
Scrape and monitor website content for changes on web pages. Automatically store after and before snapshots and get email alerts using the content checker. Use the content-checking tool in countries like USA, UK, UAE, France, Australia, Germany, Spain, Singapore, Mexico, and more.
The content scraper allows you to track website content for any web page. After checking the content for changes, the scraper sends an email alert with after and before screenshots. Use these screenshots and alerts to curate your watchdog for product sales, updates, prices, and competitors or monitor website content changes from selected web pages.
Technically, it scrapes the website content of the textual form using the selector and compares it with the past execution. It runs another scraper to store screenshots and sends them in the email.
The content scraping tool needs a content selector, URL, and email id as the input to scrape the website content. Further, you can define a screenshot selector. Alternatively, you can choose a content selector to choose screenshots.
Check out the input tab to learn more about the detailed input description.
After execution, the scraper will update the content and screenshots in the key-value store related to the scraper task.
If there is a change in content, the content checker will call another scraper to send email alerts.
Check out the below example of an email alert with changed and previous content with screenshots:
You can connect the content checker with any web application or cloud service with the help of integrations available on our platform. Further, you can integrate it with Zapier, Make, Airbyte, Google Drive, Google Sheets, Slack, GitHub, and more. The content-checking tool also allows you to use webhooks to take action for the event's commencement. For example, you can receive an alert after the successful execution of the content-scraping tool.
Our scraper gives programmatic permission to access the platform. We have organized it around RESTful HTTP endpoints to allow you to schedule, manage and run the scrapers available on our platform. Real Data API also allows you to retrieve results, scraper performances, update and create scraper versions, access datasets, etc. Use our client NPM package and Client PyPl package to access the scraper using Node.js and Python, respectively.
If the content checker can't deliver what you want, you can develop a customized scraper according to your requirements. Multiple scraper templates on our platform support TypeScript, Python, and JavaScript, to begin with. Besides, according to your requirements, you can directly write the code using Crawlee, the open-source library.
If you want to avoid developing it by yourself, contact us for a customized solution for scraping.
Our team is constantly working on the performance improvement of the scraper. Therefore, if you want to suggest anything or report any bug, please create an issue from the issue tab, or mail us about it.
To run the code examples, you need to have an RealdataAPI account. Replace
< YOUR_API_TOKEN>
in the code with your API token.
import { RealdataAPIClient } from 'RealdataAPI-Client';
// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
token: '<YOUR_API_TOKEN>',
});
// Prepare actor input
const input = {
"url": "https://www.RealdataAPI.com/change-log",
"contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"sendNotificationText": "RealdataAPI found a new change!",
"proxy": {
"useRealdataAPIProxy": false
},
"navigationTimeout": 30000
};
(async () => {
// Run the actor and wait for it to finish
const run = await client.actor("jakubbalada/content-checker").call(input);
// Fetch and print actor results from the run's dataset (if any)
console.log('Results from dataset');
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});
})();
from RealdataAPI_client import RealdataAPIClient
# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")
# Prepare the actor input
run_input = {
"url": "https://www.RealdataAPI.com/change-log",
"contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"sendNotificationText": "RealdataAPI found a new change!",
"proxy": { "useRealdataAPIProxy": False },
"navigationTimeout": 30000,
}
# Run the actor and wait for it to finish
run = client.actor("jakubbalada/content-checker").call(run_input=run_input)
# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>
# Prepare actor input
cat > input.json <<'EOF'
{
"url": "https://www.RealdataAPI.com/change-log",
"contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"sendNotificationText": "RealdataAPI found a new change!",
"proxy": {
"useRealdataAPIProxy": false
},
"navigationTimeout": 30000
}
EOF
# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/jakubbalada~content-checker/runs?token=$API_TOKEN" /
-X POST /
-d @input.json /
-H 'Content-Type: application/json'
url
Required String
Choose the webpage URL you want to monitor.
contentSelector
Required String
It is the CSS selector of the target area you want to track.
screenshotSelector
Optional String
It is the CSS selector to take screenshots.
sendNotificationTo
Required String
Enter your email ID to receive the notification.
sendNotificationText
Optional String
It is an optional text to add to the email alert.
informOnError
Optional Enum
If there is any error with any selector on the webpage, you will receive an email alert with screenshots.
false string
,
true string
proxy
Optional Object
Choose the relevant proxy server if the source website blocks the scraper or IP address.
navigationTimeout
Optional Integer
It is the millisecond duration the page should wait to time out.
retryStrategy
Optional Enum
Sometimes, the webpage fails to load correctly, or the source website blocks the scraper. Besides, more than retrying the inaccurate selector may need. However, the blocked page recognition is not 100 percent accurate.
never-retry string
,
on-all-errors string
,
on-block string
maxRetries
Optional Integer
It is to check how often the scraper must retry the process if there are errors.
{
"url": "https://www.RealdataAPI.com/change-log",
"contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
"sendNotificationText": "RealdataAPI found a new change!",
"informOnError": "false",
"proxy": {
"useRealdataAPIProxy": false
},
"navigationTimeout": 30000,
"retryStrategy": "on-block",
"maxRetries": 5
}