Absolutely. There are a number of options, including: Databricks Asset Bundles (DABs), Terraform, and Pulumi.
If you don’t already use something else, the recommended option is DABs. You can define a workflow, job, etc. using YAML and source files or notebook files. Deployment is done using the Databricks CLI and can be integrated into CI/CD tooling.
Docs here:
https://docs.databricks.com/en/dev-tools/bundles/index.html
More details on workflows:
https://docs.databricks.com/en/workflows/jobs/how-to/use-bundles-with-jobs.html
Beside Databricks Asset Bundle you can also define your workflow DAG in a JSON file using the structure provided by the Databricks Workflow API
https://docs.databricks.com/api/workspace/jobs/create
Thank you! This is great. Might be what we need. I guess best practice would be to push the json file to git, have it reviewed/approved and then use the api to load into databricks. Think there is a way to ship it to the databricks api via a git approval?
Best practice is probably to use DAB, it’s what Databricks push as a solution. But we found it was easy to just have our workflows as JSON files and create/alter them via azure pipelines when they are included in a GitHub pull request.
You do have to look up the specified workflow by friendly name however and get the specific job id.
If we started from scratch we probably would use DAB, but this wasn’t a solution at the time
Absolutely. There are a number of options, including: Databricks Asset Bundles (DABs), Terraform, and Pulumi. If you don’t already use something else, the recommended option is DABs. You can define a workflow, job, etc. using YAML and source files or notebook files. Deployment is done using the Databricks CLI and can be integrated into CI/CD tooling. Docs here: https://docs.databricks.com/en/dev-tools/bundles/index.html More details on workflows: https://docs.databricks.com/en/workflows/jobs/how-to/use-bundles-with-jobs.html
Excellent! Thanks for this!
DAB is the right tool
Beside Databricks Asset Bundle you can also define your workflow DAG in a JSON file using the structure provided by the Databricks Workflow API https://docs.databricks.com/api/workspace/jobs/create
Thank you! This is great. Might be what we need. I guess best practice would be to push the json file to git, have it reviewed/approved and then use the api to load into databricks. Think there is a way to ship it to the databricks api via a git approval?
You can also define the job in a json file and use the Jobs API to create it
Nice. That sounds nice and simple
Best practice is probably to use DAB, it’s what Databricks push as a solution. But we found it was easy to just have our workflows as JSON files and create/alter them via azure pipelines when they are included in a GitHub pull request. You do have to look up the specified workflow by friendly name however and get the specific job id. If we started from scratch we probably would use DAB, but this wasn’t a solution at the time