Smart Ingest lets you pull data stored in your Databricks warehouse and push it to Iterable. Supported databases include serverless and classic SQL Databricks warehouses.
NOTE
Contact your Iterable customer success manager to discuss adding Smart Ingest to your plan.
Smart Ingest can only import data from Databricks.
In this article
Connection requirements
Before you can connect Smart Ingest to Databricks, you need to do the following:
Allowing Smart Ingest IPs in Databricks
Before connecting, make sure you have allowed Smart Ingest IPs in Databricks.
Databricks connection details
Smart Ingest connects to Databricks using Open Database Connectivity (ODBC).
To connect to Databricks, you need the following details:
- Server Hostname
-
Port - The default port number is
443
, but yours may be different. - HTTP Path
- (Optional) Catalog - Catalogs are the first level of Unity Catalog's three-level namespace. If you specify a catalog, only schemas from that catalog are available.
- Schema
- SQL Dialect - Databricks SQL (default) or ANSI SQL-92. By default, Smart Ingest assumes that your queries use the Databricks SQL dialect. You may wish to override this behavior if your queries use legacy ANSI SQL-92 syntax. (Some features are unavailable when legacy syntax is used.)
- Access Token
Connecting Databricks to Smart Ingest
Step 1: Set up Databricks
This section walks you through getting connection details for your Databricks cluster and using your credentials to connect to Smart Ingest.
In your Databricks account console, go to the Workspaces page. Select the relevant workspace and then click Open Workspace.
In your workspace, go to the Compute page and click the relevant cluster. This brings you to its Configuration tab.
At the bottom of the page, expand the Advanced Options toggle and open the JDBC/ODBC tab.
This tab displays your cluster's Server Hostname, Port, and HTTP Path. Keep the tab open, or save these details to a secure location.
Create a Personal Access Token by following the Databricks guide.
Once you've saved these Databricks credentials, you're ready to set up the connection in Iterable.
Step 2: Connect to Databricks
To connect your Databricks warehouse to Smart Ingest:
Log in to Iterable as a user with the Manage Integrations project permission and open the project you’re working on.
Go to Integrations > Smart Ingest.
Click Connect a New Source.
Select Databricks, then click Continue.
-
In Step 1, enter your connection details for Databricks:
- Server hostname
- Port
- HTTP path
- Catalog (optional)
- Schema
- SQL dialect: select Databricks SQL (default) or ANSI SQL-92
-
In Step 2, choose from the Lightning or Basic sync engine. Lightning is faster and more efficient, but requires the database user to have write access. Basic is slower and less efficient, but doesn’t require write access.
To view the additional permissions required for the database user, select the Lightning sync engine. These permissions are necessary for Smart Ingest to create and manage the sync schema, and are customized based on the other inputs you provide in the source setup form.
To learn more about sync engines, read Optimizing Smart Ingest Sync Performance.
NOTE
Smart Ingest is co-developed by Iterable and Hightouch. Hightouch is a data processor for this feature. Smart Ingest data operations and schemas may contain the Hightouch name, but the feature is fully supported by Iterable.
In Step 3, enter your Access Token.
Click Continue. Smart Ingest automatically tests the connection.
When the connection test is successful, click Continue. (If there are problems connecting, click Back and review the connection details for accuracy.)
Add a name for your data source. This is for display in Iterable.
Click Finish.
Next steps
You've now connected your data warehouse to Smart Ingest. The next thing you can do is create a sync.
Troubleshooting Databricks
Unity Catalog support
Smart Ingest integrates with Databricks' Unity Catalog feature. During initial configuration, just make sure to include a catalog.
Subquery error
Some Databricks models return the following error:
java.lang.IllegalArgumentException: requirement failed: Subquery subquery#485, [id=#937] has not finished {...}.
This error can occur when the model query contains a subquery, especially when the subquery has an ORDER BY clause. For example:
SELECT user_query.* FROM ( SELECT * FROM default.subscriptions_table ORDER BY last_name ) user_query.*
The best way to resolve the error is to rewrite your model query to remove subqueries. As an alternative, common table expressions (CTE) are supported.
Error connecting to the database. Unauthorized/Forbidden: 403
This error can occur when testing the source connection or when running a sync that uses a Databricks model. It is typically caused by an expired Databricks access token. To solve this, generate a new token and insert it in the source configuration.