This article provides an overview of the factors that impact Smart Ingest sync performance and tips to help you optimize your sync times.
NOTE
Smart Ingest is co-developed by Iterable and Hightouch. Hightouch is a data processor for this feature. Smart Ingest data operations and schemas may contain the Hightouch name, but the feature is fully supported by Iterable.
In this article
Factors that impact sync performance
Smart Ingest uses difference-based change data capture (CDC). When syncs run, Smart Ingest saves a record of the data sent in the last sync run. Then, Smart Ingest uses that record to compare data from the last sync to the current query. As a result, Smart Ingest identifies what has changed and imports only new and changed data from your source.
There are many factors that impact sync performance:
Sync engine (Basic or Lightning)
Sync run state - Whether the sync is running the first time, a subsequent sync, or a re-sync.
Update sync mode for the Users sync type
Volume of data imported (# of rows, # of columns, size of data values)
Concurrent requests (such as another Smart Ingest sync, Iterable’s API, and other integrations)
Sync engine
A sync engine is the method used to process change data capture for your data source. You can configure your sync engine when you connect your data source to Iterable.
For some sources, Smart Ingest offers two different sync engines to process each sync: Basic and Lightning. The primary differences between these engines are:
- Where the change data capture process is performed.
- Required minimum permissions for your data source.
Each engine provides its own benefits. The Lightning sync engine provides superior sync times for large datasets, while the Basic engine is easy to set up and requires minimal read-only access.
For the initial sync and for re-syncs, processing time is similar regardless of your sync engine.
For subsequent sync runs, the Lightning sync engine processes larger amounts of data in a faster period of time when compared to the Basic engine.
Lightning sync engine
The Lightning sync engine offers superior sync speeds because it processes change data capture in the data source instead of in Smart Ingest infrastructure. However, the Lightning sync engine requires write access to your data source, so there are additional considerations and setup.
The Lightning sync engine uses your data sources’s database to update two
schemas, hightouch_audit
and hightouch_planner
, which store data related to
syncs. The database user needs write access to these two schemas in order to
function. Data stored in your data source remains unchanged.
Whenever a sync that uses the Lightning sync engine runs, Iterable creates two
new tables in the hightouch_planner
schema: one to log the data model's query
results (the plan
table) and one to log rows rejected during the sync (the
rejections
table). This data does not contain PII, and is used for change data
capture to determine which rows to sync with each run.
Smart Ingest only keeps the two most recent pairs of tables to compute the diff
for the current sync. In other words, Smart Ingest doesn't maintain historical
records of these tables, and any given sync never has more than the two most
recent pairs of tables. Because these tables and their names change with every sync run, Smart Ingest requires permission to write to the entire hightouch_planner
schema, rather than for specific tables.
The Lightning sync engine is available for the following data sources:
Basic sync engine
If you're unable to obtain write access to your data source, the Basic sync engine is available to you.
By default, all syncs import data using the Basic sync engine. This is the default option because it requires minimal read-only permissions for your data source. The Basic sync engine performs change data capture in Smart Ingest infrastructure at the time of the sync.
Sync engines compared
Here is a summary of the differences between each sync engine:
Basic sync | Lightning sync | |
---|---|---|
Performance | Normal | Quicker (up to 100 times faster) |
Reliability | Normal | High |
Resilience to sync interruptions | Normal | High |
Ease of setup | Simpler | More involved |
Location of change data capture | Smart Ingest infrastructure | Warehouse schemas managed by Smart Ingest |
Required warehouse permissions | Read-only | Read and write |
Ability to switch | You can move to a sync the Lightning engine at any time | You can't move a sync to the Basic engine once Lightning is configured |
Upgrading from Basic to Lightning
When you connect your source, you can choose the sync engine you want to use. You can also upgrade to Lightning later. Once on the Lightning sync engine, a source can't switch back to using the Basic sync engine.
As a general rule of thumb, Smart Ingest sync engine benefits from upgrading when your source data exceeds a number of rows:
- 100,000 rows: The Lightning engine starts to feel fast
- 1,000,000 rows: The Basic engine begins to feel slow
- 5,000,000 rows: The Lightning engine may be required
This performance impact also varies depending on:
Data size: A high number of columns and larger sized data values would need performance tuning with fewer rows than noted above.
Sync schedule: Higher frequency syncs require syncs to be completed at a faster pace. A sync run must finish before the next sync run can begin.
If you continue experiencing slowness on the Basic engine after your initial sync, consider updating your source connection to use the Lightning sync engine.
If you have a large dataset, consider upgrading your source connection to use the Lightning sync engine before your initial sync.
Instructions
If you initially set up a Basic sync and want to upgrade to Lightning sync, you can do so by following these steps:
Go to Integrations > Smart Ingest.
Find the name of the data source you want to upgrade and click Edit.
-
In the Sync Engine section, select Lightning.
When you select the Lightning sync engine, Smart Ingest automatically displays the additional permissions that are required for the database user. These permissions are necessary for Smart Ingest to create and manage the sync schema. Provide this information to your data source administrator so they can grant appropriate access.
Here's an example of what you might see:
Once the permissions are granted and you've entered the user's credentials, click Save Changes.
Smart Ingest automatically tests the connection. When the test is successful, click Continue.
Click Finish.
Sync runs
A sync run is each instance of your sync importing data from your source to Iterable (determined by your sync schedule). The performance of a sync run may be impacted on the state of your sync.
First sync
The first time a sync runs, Iterable imports every row from your data model in order to backfill and set the stage for change data capture in subsequent syncs.
Most initial syncs take less than a day. For larger datasets, it isn’t
uncommon for the initial sync to take a few days, regardless of sync engine.
Subsequent syncs
On subsequent syncs, Iterable only imports data rows that have changed since the last sync. These syncs are much faster than the initial data import because they reflect incremental changes.
The Lightning sync engine processes change data capture in your data source, which provides faster sync times for large datasets.
The performance of subsequent syncs is impacted by the volume of data that has changed since the last sync. The more data that has changed, the longer the sync takes to process.
Increasing the frequency of syncs can help reduce the amount of data that needs to be processed in each sync. However, syncs must finish before the next sync can begin, so increasing the frequency of syncs may not always improve performance, such as when the sync hasn't finished syncing before the next sync is scheduled to begin.
Re-syncs
An exception to faster processing on subsequent syncs occurs when you add new fields to the data model. The sync’s change data capture only tracks the fields defined in your data model. Adding fields to the data model requires a one-time re-sync of all rows in order to import the new column of data. Because the sync includes all rows, its run time reflects similar processing times to the first sync run.
Rate limits and batch size
Rate limits and batch sizes are important factors that impact sync performance. These settings control the number of requests made to Iterable’s API and the amount of data sent in each request.
Rate limits are the maximum number of requests that can be made to Iterable’s API in a given time period. Rate limits are set by Iterable to ensure that requests are processed in a timely manner and to prevent overloading the Iterable API.
It's important to understand that these rate limits are aggregate limits for all requests coming into your Iterable project, not just a single Smart Ingest sync.
Batching is the process of sending multiple data records in a single request to Iterable’s API. Batching reduces the number of requests made to Iterable’s API and can improve sync performance. Batch sizes are the number of data records sent in each request.
Making changes to rate limits and batch sizes in a sync
Some sync types have default rate limits and batch sizes that you can adjust to optimize performance. These settings change the rate and size of requests made by a given Smart Ingest sync, and don’t reflect or change the aggregate limits for using Iterable's APIs.
Scenarios where you may want to change the rate limits and batch sizes for a sync include:
Large datasets: If you have a large dataset, you may want to increase the batch size to reduce the number of requests made to Iterable’s API.
High frequency syncs: If you have a high frequency sync, you may want to adjust the rate limit to ensure that requests are processed in a timely manner.
Slow sync performance: If you are experiencing slow sync performance, you may want to adjust the rate limit and batch size to improve sync times.
API rate limits: If you are hitting Iterable’s API rate limits, you may want to adjust the rate limit and batch size to ensure that requests are processed without errors. For more information, read Troubleshooting Smart Ingest.
Sync modes for Users
When you are creating a sync for the Users sync type, which imports user profile data, there are two sync modes available: update and upsert.
Upsert more is more performant than update mode. This is because update mode first checks for a user’s existence in Iterable using a GET request before it runs the POST request with the updated user profile data. This extra operation ensures that only user records that already exist in Iterable are updated, however it does add to sync processing time.
If your project is email-based and you have unique userId
values in your
Iterable project, you can also use different record matching to control the API
used and increase performance speeds. To learn more, read
Record matching in update mode for email-based projects.