Documentation Experience Platform Ingest batch data

Ingest batch data

Last update: Mon Sep 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Data Ingestion

CREATED FOR:

Beginner
Intermediate
Developer

In this lesson, you will ingest batch data into Experience Platform using various methods.

Batch data ingestion allows you to ingest a large amount of data into Adobe Experience Platform at once. You can ingest batch data in a one time upload within Platform’s interface or using the API. You can also configure regularly scheduled batch uploads from third-party services such as cloud storage services using Source connectors.

Data Engineers will need to ingest batch data outside of this tutorial.

Before you begin the exercises, watch this short video to learn more about data ingestion:

https://video.tv.adobe.com/v/27106?learn=on

Permissions required

In the Configure Permissions lesson, you set up all the access controls required to complete this lesson.

You will need access to an (S)FTP server or cloud storage solution for the Sources exercise. There is a workaround if you do not have one.

Ingest data in batches with Platform user interface

Data can be uploaded directly into a dataset on the datasets screen in JSON and parquet formats. This is a great way to test ingestion of some of your data after creating a

Download and prep the data

First, get the sample data and customize it for your tenant:

NOTE

Data contained in the luma-data.zip file is fictitious and is to be used for demonstration purposes only.

Download luma-data.zip to your Luma Tutorial Assets folder.
Unzip the file, creating a folder called luma-data which contains the four data files we will use in this lesson
Open luma-loyalty.json in a text editor and replace all instances of _techmarketingdemos with your own underscore-tenant id, as seen in your own schemas:
Save the updated file

Ingest the data

In the Platform user interface, select Datasets in the left navigation
Open your Luma Loyalty Dataset
Scroll down until you see the Add Data section in the right column
Upload the luma-loyalty.json file.
Once the file uploads, a row for the batch will appear
If you reload the page after a few minutes, you should see that the batch has successfully uploaded with 1000 records and 1000 profile fragments.

NOTE

There are a few options, Error diagnostics and Partial ingestion, that you will see on various screens in this lesson. These options aren’t covered in the tutorial. Some quick info:

Enabling error diagnostics generates data about the ingestion of your data, which you can then review using the Data Access API. Learn more about it in the documentation.
Partial ingestion allows you to ingest data containing errors, up to a certain threshold which you can specify. Learn more about it in the documentation

Validate the data

There are a few ways to confirm that the data was successfully ingested.

Validate in the Platform user interface

To confirm that the data was ingested into the dataset:

On the same page where you have ingested the data, select the Preview dataset button on top-right
Select the Preview button and you should be able to see some of the ingested data.

To confirm that the data landed in Profile (may take a few minutes for the data to land):

Go to Profiles in the left navigation
Select the icon next to the Select identity namespace field to open the modal
Select your Luma Loyalty Id namespace
Then enter one of the loyaltyId values from your dataset, 5625458
Select View

Validate with data ingestion events

If you subscribed to data ingestion events in the previous lesson, check your unique webhook.site URL. You should see three requests show up in the following order, with some time in between them, with the following eventCode values:

ing_load_success—the batch as ingested
ig_load_success—the batch was ingested into identity graph
ps_load_success—the batch was ingested into profile service

Data ingestion webhook

See the documentation for more details on the notifications.

Ingest data in batches with Platform API

Now let’s upload data using the API.

NOTE

Data architects, feel free to upload the CRM data via the user interface method.

Download and prep the data

You should have already downloaded and unzipped luma-data.zip into your Luma Tutorial Assets folder.
Open luma-crm.json in a text editor and replace all instances of _techmarketingdemos with your own underscore-tenant id, as seen in your schemas
Save the updated file

Get the dataset id

First we let’s get the id of the dataset id of the dataset into which we want to ingest data:

Open Postman
If you don’t have an access token, open the request OAuth: Request Access Token and select Send to request a new access token, just like you did in the Postman lesson.
Open your environment variables and make sure the value of CONTAINER_ID is still tenant
Open the request Catalog Service API > Datasets > Retrieve a list of datasets. and select Send
You should get a 200 OK response
Copy the id of the Luma CRM Dataset from the Response body

Create the batch

Now we can create a batch in the dataset:

Download Data Ingestion API.postman_collection.json to your Luma Tutorial Assets folder
Import the collection into Postman
Select the request Data Ingestion API > Batch Ingestion > Create a new batch in Catalog Service.

Paste the following as the Body of the request, replacing the datasetId value with your own:

code language-json
`{ "datasetId":"REPLACE_WITH_YOUR_OWN_DATASETID", "inputFormat": { "format": "json" } }`

Select the Send button
You should get a 201 Created response containing the id of your new batch!
Copy the id of the new batch

Ingest the data

Now we can upload the data into the batch:

Select the request Data Ingestion API > Batch Ingestion > Upload a file to a dataset in a batch.
In the Params tab, enter your dataset id and batch id into their respective fields
In the Params tab, enter luma-crm.json as the filePath
In the Body tab, select the binary option
Select the downloaded luma-crm.json from your local Luma Tutorial Assets folder
Select Send and you should get a 200 OK response with ‘1’ in the response body

At this point, if you look at your batch in the Platform user interface, you will see that it is in a “Loading” status:
Batch loading

Because the Batch API is often used to upload multiple files, you need need to tell Platform when a batch is complete, which we will do in the next step.

Complete the batch

To complete the batch:

Select the request Data Ingestion API > Batch Ingestion > Finish uploading a file to a dataset in a batch.
In the Params tab, enter COMPLETE as the action
In the Params tab, enter your batch id. Do not worry about dataset id or filePath, if they are present.
Make sure that the URL of the POST is https://platform.adobe.io/data/foundation/import/batches/:batchId?action=COMPLETE and that there aren’t any unnecessary references to the datasetId or filePath
Select Send and you should get a 200 OK response with ‘1’ in the response body

Validate the data

Validate in the Platform user interface

Validate the data has landed in the Platform user interface just like you did for the Loyalty dataset.

First, confirm the batch shows that 1000 records have ingested:

Batch success

Next, confirm the batch using Preview dataset:

Batch preview

Finally, confirm one of your profiles has been created by looking up one of the profiles by the Luma CRM Id namespace, for example 112ca06ed53d3db37e4cea49cc45b71e

Profile ingested

There is one interesting thing that just happened that I want to point out. Open that Danny Wright profile. The profile has both a Lumacrmid and a Lumaloyaltyid. Remember the Luma Loyalty Schema contained two identity fields, Luma Loyalty Id and CRM Id. Now that we’ve uploaded both datasets, they’ve merged into a single profile. The Loyalty data had Daniel as the first name and “New York City” as the home address, while the CRM data had Danny as the first name and Portland as the home address for the customer with the same Loyalty Id. We will come back to why the first name displays Danny in the lesson on merge policies.

Congratulations, you’ve just merged profiles!

Profile merged

Validate with data ingestion events

If you subscribed to data ingestion events in the previous lesson, check your unique webhook.site URL. You should see three requests come in, just like with the loyalty data:

Data ingestion webhook

See the documentation for more details on the notifications.

Ingest data with Workflows

Let’s look at another way of uploading data. The workflows feature allows you to ingest CSV data which is not already modeled in XDM.

Download and prep the data

You should have already downloaded and unzipped luma-data.zip into your Luma Tutorial Assets folder.
Confirm that you haveluma-products.csv

Create a workflow

Now let’s set up workflow:

Go to Workflows in the left navigation
Select Map CSV to XDM schema and select the Launch button
Select your Luma Product Catalog Dataset and select the Next button
Add the luma-products.csv file you downloaded and select the Next button
Now you are in the mapper interface, in which you can map a field from the source data (one of the column names in the luma-products.csv file) to XDM fields in the target schema. In our example, the column names are close enough to the schema field names that the mapper is able to auto-detect the right mapping! If the mapper was unable to auto-detect the right field, you would select the icon to the right of the target field to select the correct XDM field. Also, if you didn’t want to ingest one of the columns from the CSV, you could delete the row from the mapper. Feel free to play around and change column headings in the luma-products.csv to get familiar with how the mapper works.
Select the Finish button

Validate the data

When the batch has uploaded, verify the upload by previewing the dataset.

Since the Luma Product SKU is a non-people namespace, we won’t see any profiles for the product skus.

You should see the three hits to your webhook.

Ingest data with Sources

Okay, you did things the hard way. Now let’s move into the promised land of automated batch ingestion! When I say, “SET IT!” you say, “FORGET IT!” “SET IT!” “FORGET IT!” “SET IT!” “FORGET IT!” Just kidding, you would never do such a thing! Ok, back to work. You’re almost done.

Go to Sources in the left navigation to open the Sources catalog. Here you will see various out-of-the-box integrations with industry-leading data and storage providers.

Source catalog

Okay, let’s ingest data using a source connector.

This exercise will be choose-your-own-adventure style. I am going to show the workflow using the FTP source connector. You can either use a different Cloud Storage source connector that you use at your company, or upload the json file using the dataset user interface like we did with the loyalty data.

Many of the Sources have a similar configuration workflow, in which you:

Enter your authentication details
Select the data you want to ingest
Select the Platform dataset into which you want to ingest it
Map the fields to your XDM schema
Choose the frequency with which you want to reingest data from that location

NOTE

The Offline Purchase data we will be using in this exercise contains datetime data. Datetime data should be in either ISO 8061 formatted strings (“2018-07-10T15:05:59.000-08:00”) or Unix Time formatted in milliseconds (1531263959000) and are converted at ingestion time to the target XDM type. For more on data conversion and other constraints, see the Batch Ingestion API documentation.

Download, prep, and upload the data to your preferred cloud storage vendor

You should have already downloaded and unzipped luma-data.zip into your Luma Tutorial Assets folder.
Open luma-offline-purchases.json in a text editor and replace all instances of _techmarketingdemos with your own underscore-tenant id, as seen in your schemas
Update all of the timestamps so that the events occur in the last month (for example, search for "timestamp":"2022-06 and replace the year and month)
Choose your preferred cloud storage provider, making sure it is available in the Sources catalog
Upload luma-offline-purchases.json to a location in your preferred cloud storage provider

Ingest the data to your preferred cloud storage location

In the Platform user interface, filter the Sources catalog to Cloud storage
Note that there are convenient links to documentation under the ...
In the box of your preferred Cloud storage vendor, select the Configure button
Authentication is the first step. Enter the name for your account, for example Luma's FTP Account and your authentication details. This step should be fairly similar for all cloud storage sources, although the fields may vary slightly. Once you’ve entered the authentication details for an account, you can reuse them for other source connections that might be sending different data on different schedules from other files in the same account
Select the Connect to source button
When Platform has successfully connected to the Source, select the Next button
On the Select data step, the user interface will use your credentials to open the folder on your cloud storage solution
Select the files you would like to ingest, for example luma-offline-purchases.json
As the Data format, select XDM JSON
You can then preview the json structure and sample data in your file
Select the Next button
On the Mapping step, select your Luma Offline Purchase Events Dataset and select the Next button. Note in the message that since the data we are ingesting is a JSON file, there is no mapping step where we map source field to target field. JSON data must be in XDM already. If you were ingesting a CSV, you would see the full mapping user interface on this step:
On the Scheduling step, you choose the frequency with which you want to reingest data from the Source. Take a moment to look at the options. We are just going to do a one-time ingestion, so leave the Frequency on Once and select the Next button:
On the Dataflow detail step, you can choose a name for your dataflow, enter an optional description, turn on error diagnostics, and partial ingestion. Leave the settings as they are and select the Next button:
On the Review step, you can review all of your settings together and either edit them or select the Finish button
After saving you will land on a screen like this:

Validate the data

When the batch has uploaded, verify the upload by previewing the dataset.

You should see the three hits to your webhook.

Look up the profile with value 5625458 in the loyaltyId namespace again to see if there are any purchase events in their profile. You should see one purchase. You can dig into the details of the purchase by selecting View JSON:

Purchase event in profile

ETL Tools

Adobe partners with multiple ETL vendors to support data ingestion into Experience Platform. Because of the variety of third-party vendors, ETL is not covered in this tutorial, although you are welcome to review some of these resources:

Additional Resources

Now let’s stream data using the Web SDK

recommendation-more-help

513160b6-bf42-4c58-abdd-4f817b1cccad