Use the Deduplication activity’s Merge functionality deduplication-merge
About this use case about-this-use-case
This use case describes how to use of the Merge functionality in the Deduplication activity.
For more information on this functionality, refer to this section.
The Deduplication activity is used for removing duplicate rows from a data set. In this use case, the data shown below is duplicated based on the Email field.
With the Deduplication activity’s Merge fonctionality, you can configure a set of rules for the deduplication to define a group of fields to merge into a single resulting data record. For example, with a set of duplicate records, you can choose to keep the oldest phone number or most recent name.
Activating the Merge functionality activating-merge
To enable the merge functionality, you first need to configure the Deduplication activity. To do this, follow these steps:
-
Open the activity, then click the [Edit configuration] link.
-
Select the reconciliation field to use for the deduplication, then click Next. In this example, we want to deduplicate based on the email field.
-
Click the Advanced parameters link, then activate the Merge records and Use several record merging criteria options.
-
The Merge tab is added to the Deduplication configuration screen. We will use this tab to specify the data to merge when performing deduplication.
Configuring the fields to merge configuring-rules
Here are the rules we want to use to merge the data into a single record:
- Keep the most recent name (first name and last name fields),
- Keep the most recent mobile phone,
- Keep the oldest phone number,
- All fields in a group must be non-null to be eligible for the final record.
To configure these rules, follow these steps:
-
Open the Merge tab, then click the Add button.
-
Specify the identifier and label of the group of fields to be merged.
-
Indicate the conditions for selecting the records to be taken into account.
-
Sort on the last modification date in order to select the most recent name.
-
Select the fields to merge. In this example, we want to keep the first name and last name fields.
-
The fields are added to the set of data to merge and a new element is added to the workflow schema.
Repeat these steps to configure the mobile phone and phone fields.
Results results
After configuring these rules, the following data is received at the end of the Deduplication activity.
The result is merged from the three records according to the rules configured earlier. After comparison, it is concluded that the most recent name and mobile phone are used, along with the original phone number.