Use Databricks workflows to run dbt Cloud jobs
Updated
Databricks
dbt Core
dbt Cloud
Orchestration
Intermediate
Menu
- 1 Introduction
- 2 Set up a Databricks secret scope
- 3 Create a Databricks Python notebook
- 4 Configure the workflows to run the dbt Cloud jobs
Introduction
Using Databricks workflows to call the dbt Cloud job API can be useful for several reasons:
- Integration with other ETL processes — If you're already running other ETL processes in Databricks, you can use a Databricks workflow to trigger a dbt Cloud job after those processes are done.
- Utilizes dbt Cloud jobs features — dbt Cloud gives the ability to monitor job progress, manage historical logs and documentation, optimize model timing, and much more.
- Separation of concerns — Detailed logs for dbt jobs in the dbt Cloud environment can lead to more modularity and efficient debugging. By doing so, it becomes easier to isolate bugs quickly while still being able to see the overall status in Databricks.
- Custom job triggering — Use a Databricks workflow to trigger dbt Cloud jobs based on custom conditions or logic that aren't natively supported by dbt Cloud's scheduling feature. This can give you more flexibility in terms of when and how your dbt Cloud jobs run.
Prerequisites
- Active Teams or Enterprise dbt Cloud account
- You must have a configured and existing dbt Cloud deploy job
- Active Databricks account with access to Data Science and Engineering workspace and Manage secrets
- Databricks CLI
- Note: You only need to set up your authentication. Once you have set up your Host and Token and are able to run
databricks workspace ls /Users/<someone@example.com>
, you can proceed with the rest of this guide.
- Note: You only need to set up your authentication. Once you have set up your Host and Token and are able to run
0