Disclaimer : Real Data API only extracts publicly available data while maintaining a strict policy against collecting any personal or identity-related information.
Match ecommerce products by gathering them from various websites and comparing them using AI Product Matcher. Use the tool for product matching using AI whenever you want from multiple ecommerce stores for market research, competitive analysis, and dynamic pricing. Our real-time AI product matching tool is available in countries like the USA, UK, UAE, Canada, France, Germany, France, Singapore, Spain, Mexico, etc.
The AI product matching tool uses a customized model based on machine learning and resolves product mapping issues across digital stores. Use it to discover similar products across various e-commerce sites for competitor analysis, dynamic pricing, and market research. Also, use it to replace manual product mapping. Check out the input section below for detailed settings.
To use the enterprise AI data matching tool, you must have datasets of targeted products to match. If you don't have the dataset, you can scrape those products using any of our scrapers from the store page and use the generated dataset. If you already have those datasets, import them to the console account using API. Note that you can only match English product data.
Contact our enterprise team if you want us to manage your data funnel or design the scraper based on your custom requirements. Meanwhile, check out the below ecommerce data scraper available on our platform:
Here is how you can prepare the AI product matcher input. Check out the samples of tentative input at the end of the section.
There are two ways to use the scraper based on dataset format:
You may have a dataset containing information rows for two products to compare and match. Here, enter the pair dataset ids in the input pair_dataset_ids. The scraper allows you to enter multiple IDs if you have more than two datasets to match simultaneously. The AI product matching tool will check all the data rows, compare them, and decide their similarities.
In the other case, you may have individual data for each product from different e-commerce stores. Put the dataset ids into the input, like dataset1_ids and dataset2_ids. Then the tool will check both datasets, discover the possibility of product matching in these ecommerce products, and display the output.
The next part of the input for the AI product matcher is to update it about the dataset format you will use by representing the scraper input as input_mapping. It would help if you fed the input in JSON format with eshop1 and eshop2 attributes. They describe what factors the scraper will find the required data for the particular ecommerce store. These attributes must contain objects according to the following example:
{ "id": "productUrl", "name": "productName", "price": "currentPrice", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": [ "SKU", "ASIN" ] }
All the attributes of the object mention where to find the required product attributes in the dataset. For example, using AI, you can find the product name in the productName attribute of the dataset you have already given for product matching in ecommerce. Here are the required product attributes:
[ { "key": "RAM memory", "value": "16 GB" }, { "key": "CPU", "value": "Intel Core i3" }, { "key": "Display resolution", "value": "1920:1080" } ]
No need to constantly enter each of the above inputs every time; you may not find a few of them in a few e-commerce stores to match products. But if you don't provide them, the matcher may not give you accurate output. Check out the performance section to learn more.
Once you specify the input dataset format, you must mention the attributes you want the product matcher to include in the output dataset. You can do it using the scraper input output_mapping, similar to input_mapping, which you can read in the below sample:
{ "eshop1": { "id_source": "productUrl", "name_source": "productName" }, "eshop2": { "id_target": "EAN", "name_target": "productName" } }
Likewise, mention the separate attributes for every e-store. Then, each line will mention the definition of the output datasets, for instance, id_source, and its corresponding input dataset, like productName. Additionally, the resulting dataset will include two attributes for the product pair:
As mentioned above, you can replace manual product matching with AI tools like Product Matcher or improve its efficiency using various settings. To do this, mention the precision/recall tradeoff setting with the precision/recall tradeoff representation in the input form or as the attribute precision_recall in the JSON formatted input. Its output may need to be corrected due to the absence of a flawless machine-learning model. It will enable you to mention issues and mistakes the tool should minimize while in the process. Use any of these two settings.
Check out the expected performance section to learn more about particular performance numbers in this readme section.
{ "pair_dataset_ids": [ "Insert your dataset IDs here" ], "input_mapping": { "eshop1": { "id": "id1", "name": "name1", "price": "price1", "short_description": "short_description1", "long_description": "long_description1", "specification": "specification1", "code": [ "SKU", "ASIN" ] }, "eshop2": { "id": "id2", "name": "name2", "price": "price2", "short_description": "short_description2", "long_description": "long_description2", "specification": "specification2", "code": [ "EAN", "ASIN" ] } }, "output_mapping": { "eshop1": { "id_source": "id1", "name_source": "name1" }, "eshop2": { "id_target": "id2", "name_target": "name2" } }, "precision_recall": "precision" }
{ "dataset1_ids": [ "Insert your dataset IDs here" ], "dataset2_ids": [ "Insert your dataset IDs here" ], "input_mapping": { "eshop1": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "SKU", "ASIN" ] }, "eshop2": { "id": "productUrl", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specifications", "code": [ "EAN", "ASIN" ] } }, "output_mapping": { "eshop1": { "id_source": "url", "name_source": "name" }, "eshop2": { "id_target": "productUrl", "name_target": "name" } }, "precision_recall": "precision" }
The tool will store the results for real-time product matching in the default dataset of scraper execution. That you can find on the run page of your console account. Export the results in different ways, like manually and using an API in Excel, CSV, or JSON format.
Check out the above subsection for output formats to see more details on the output format.
Our team constantly works to make the AI product matcher more and more accurate by experimenting, analyzing, and using the trial and error method, where we gather thousands of manually annotated product pairs from various categories. Besides, we use that data to train the model for better results to deliver the best product-matching services. We have also tailored the separate product pair dataset to feed the tool for the first time after the training. After that, we checked the performance with the unique data. The accuracy of the results relies on the setting precision/recall tradeoff.
We saw that the AI product matcher is around 95 percent precise in giving accurate results. And could find around 60 percent of product pairs with the same products.
Even though we train and test the AI model for precision, accuracy, and recall, we recommend you investigate it before importing the large-scale data into the tool because it may give variable results considering that you use data from different resources.
The pricing model depends on pay-per-result, which means you will pay a small amount for specific results. You can check our pricing page to learn more about the detailed pricing. Here, the amount of scraper charges depends on the result and the input type for the product pairs.
If you wish to restrict the results and the budget, you can check it out in the scraper options.
Check out how industries are using AI Product Matcher around the world.
E-commerce & Retail
You should have a Real Data API account to execute the program examples. Replace
< YOUR_API_TOKEN>
in the program using the token of your scraper. Read about the live APIs with Real Data API docs for more explanation.
import { RealdataAPIClient } from 'RealdataAPI-Client';
// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
token: '<YOUR_API_TOKEN>',
});
// Prepare Actor input
const input = {
"dataset1_ids": [
"GYVCj4hEeqnX3dJyu"
],
"dataset2_ids": [
"OmzHV4VEByO4KohMF"
],
"input_mapping": {
"eshop1": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel"
]
},
"eshop2": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel"
]
}
}
};
(async () => {
// Run the Actor and wait for it to finish
const run = await client.actor("equidem/ai-product-matcher").call(input);
// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});
})();
from RealdataAPI_client import RealdataAPIClient
# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")
# Prepare the Actor input
run_input = {
"dataset1_ids": ["GYVCj4hEeqnX3dJyu"],
"dataset2_ids": ["OmzHV4VEByO4KohMF"],
"input_mapping": {
"eshop1": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel",
],
},
"eshop2": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel",
],
},
},
}
# Run the Actor and wait for it to finish
run = client.actor("equidem/ai-product-matcher").call(run_input=run_input)
# Fetch and print Actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>
# Prepare Actor input
cat > input.json <<'EOF'
{
"dataset1_ids": [
"GYVCj4hEeqnX3dJyu"
],
"dataset2_ids": [
"OmzHV4VEByO4KohMF"
],
"input_mapping": {
"eshop1": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel"
]
},
"eshop2": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel"
]
}
}
}
EOF
# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/equidem~ai-product-matcher/runs?token=$API_TOKEN" /
-X POST /
-d @input.json /
-H 'Content-Type: application/json'
dataset1_ids
Optional Array
It is about the dataset IDs having product information from the first store.
dataset2_ids
Optional Array
It is about the dataset IDs having product information from the second store.
pair_dataset_ids
Optional Array
It is about the dataset IDs containing the product pair information to match.
input_mapping
Required Object
It is about mapping objects mentioning the data attribute the product mapping model will use.
output_mapping
Optional Object
It is about mapping objects mentioning data attributes you want in the output datasets with their names.
precision_recall
Optional Enum
Mention your priority about recall or precision.
recall string
,
precision string
{
"dataset1_ids": [
"GYVCj4hEeqnX3dJyu"
],
"dataset2_ids": [
"OmzHV4VEByO4KohMF"
],
"input_mapping": {
"eshop1": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel"
]
},
"eshop2": {
"id": "url",
"name": "name",
"price": "price",
"short_description": "shortDescription",
"long_description": "longDescription",
"specification": "specification",
"code": [
"sku",
"productModel"
]
}
},
"precision_recall": "precision"
}