Privacy request processing in the data lake
Adobe Experience Platform Privacy Service processes customer requests to access, opt out of sale, or delete their personal data as delineated by legal and organizational privacy regulations.
This document covers essential concepts related to processing privacy requests for customer data stored in the data lake.
Getting started
It is recommended that you have a working understanding of the following Experience Platform services before reading this guide:
- Privacy Service: Manages customer requests for accessing, opting out of sale, or deleting their personal data across Adobe Experience Cloud applications.
- Catalog Service: The system of record for data location and lineage within Experience Platform. Provides an API that can be used to update dataset metadata.
- Experience Data Model (XDM) System: The standardized framework by which Experience Platform organizes customer experience data.
- Identity Service: Solves the fundamental challenge posed by the fragmentation of customer experience data by bridging identities across devices and systems.
Understanding identity namespaces namespaces
Adobe Experience Platform Identity Service bridges customer identity data across systems and devices. Identity Service uses identity namespaces to provide context to identity values by relating them to their system of origin. A namespace can represent a generic concept such as an email address (“Email”) or associate the identity with a specific application, such as an Adobe Advertising Cloud ID (“AdCloud”) or Adobe Target ID (“TNTID”).
Identity Service maintains a store of globally defined (standard) and user-defined (custom) identity namespaces. Standard namespaces are available for all organizations (for example, “Email” and “ECID”), while your organization can also create custom namespaces to suit its particular needs.
For more information about identity namespaces in Experience Platform, see the identity namespace overview.
Adding identity data to datasets
When creating privacy requests for the data lake, valid identity values (and their associated namespaces) must be provided for each individual customer in order to locate their data and process it accordingly. Therefore, all datasets that are subject to privacy requests must contain an identity descriptor in their associated XDM schema.
This section walks through the steps of adding an identity descriptor to an existing dataset’s XDM schema. If you already have a dataset with an identity descriptor, you can skip ahead to the next section.
There are two methods of adding an identity descriptor to a dataset schema:
Using the UI identity-ui
In the Experience Platform user interface, the Schemas workspace allows you to edit your existing XDM schemas. To add an identity descriptor to a schema, select the schema from the list and follow the steps for setting a schema field as an identity field in the Schema Editor tutorial.
Once you have set the appropriate fields within the schema as identity fields, you can proceed to the next section on submitting privacy requests.
Using the API identity-api
schemaRef.id
{TENANT_ID}
and the concept of containers, see the getting started section of the API guide.You can add an identity descriptor to a dataset’s XDM schema by making a POST request to the /descriptors
endpoint in the Schema Registry API.
API format
POST /descriptors
Request
The following request defines an identity descriptor on an “email address” field in a sample schema.
curl -X POST \
https://platform.adobe.io/data/foundation/schemaregistry/tenant/descriptors \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
-H 'Content-Type: application/json' \
-d '
{
"@type": "xdm:descriptorIdentity",
"xdm:sourceSchema": "https://ns.adobe.com/{TENANT_ID}/schemas/fbc52b243d04b5d4f41eaa72a8ba58be",
"xdm:sourceVersion": 1,
"xdm:sourceProperty": "/personalEmail/address",
"xdm:namespace": "Email",
"xdm:property": "xdm:code",
"xdm:isPrimary": false
}'
@type
xdm:sourceSchema
xdm:sourceVersion
xdm:sourceSchema
.xdm:sourceProperty
xdm:namespace
xdm:property
xdm:namespace
.xdm:isPrimary
Response
A successful response returns HTTP status 201 (Created) and the details of the newly created descriptor.
{
"@type": "xdm:descriptorIdentity",
"xdm:sourceSchema": "https://ns.adobe.com/{TENANT_ID}/schemas/fbc52b243d04b5d4f41eaa72a8ba58be",
"xdm:sourceVersion": 1,
"xdm:sourceProperty": "/personalEmail/address",
"xdm:namespace": "Email",
"xdm:property": "xdm:code",
"xdm:isPrimary": false,
"meta:containerId": "tenant",
"@id": "f3a1dfa38a4871cf4442a33074c1f9406a593407"
}
Submitting requests submit
The following section outlines how to make privacy requests for the data lake using the Privacy Service UI or API.
Using the UI
When creating job requests in the UI, be sure to select AEP Data Lake under Products in order to process jobs for data stored in the data lake.
Using the API
When creating job requests in the API, any userIDs
that are provided must use a specific namespace
and type
depending on the data store they apply to. IDs for the data lake must use unregistered
for their type
value, and a namespace
value that matches one the privacy labels that have been added to applicable datasets.
In addition, the include
array of the request payload must include the product values for the different data stores the request is being made to. When making requests to the data lake, the array must include the value aepDataLake
.
The following request creates a new privacy job for the data lake, using the unregistered email_label
namespace. It also includes the product value for the data lake in the include
array:
curl -X POST \
https://platform.adobe.io/data/core/privacy/jobs \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'Content-Type: application/json' \
-d '{
"companyContexts": [
{
"namespace": "imsOrgID",
"value": "{ORG_ID}"
}
],
"users": [
{
"key": "user12345",
"action": ["access","delete"],
"userIDs": [
{
"namespace": "email_label",
"value": "ajones@acme.com",
"type": "unregistered"
},
{
"namespace": "email_label",
"value": "jdoe@example.com",
"type": "unregistered"
}
]
}
],
"include": ["aepDataLake"],
"expandIds": false,
"priority": "normal",
"regulation": "ccpa"
}'
x-sandbox-name
header included in the request is ignored by the system.Delete request processing
When Experience Platform receives a delete request from Privacy Service, Platform sends confirmation to Privacy Service that the request has been received and affected data has been marked for deletion. The records are then removed from the data lake within seven days. During that seven-day window, the data is soft-deleted and is therefore not accessible by any Platform service.
If you also included ProfileService
or identity
in the privacy request, their associated data is handled separately. See the section on delete request processing for Profile for more information.
Next steps
By reading this document, you have been introduced to the important concepts involved with processing privacy requests for the data lake. It is recommended that you continue reading the documentation provided throughout this guide in order to deepen your understanding of how to manage identity data and create privacy jobs.
See the document on privacy request processing for Real-Time Customer Profile for steps on processing privacy requests for the Profile store.
Appendix
The following section contains additional information for processing privacy requests in the data lake.
Labeling nested map-type fields nested-maps
It is important to note that there are two kinds of nested map-type fields that do not support privacy labeling:
- A map-type field within an array-type field
- A map-type field within another map-type field
Privacy job processing for either of the two examples above will eventually fail. For this reason, it is recommended that you avoid using nested map-type fields to store private customer data. Relevant consumer IDs should be stored as a non-map datatype within the identityMap
field (itself a map-type field) for record-based datasets, or the endUserID
field for time-series-based datasets.