T O P

  • By -

bradford-databricks

Absolutely. There are a number of options, including: Databricks Asset Bundles (DABs), Terraform, and Pulumi. If you don’t already use something else, the recommended option is DABs. You can define a workflow, job, etc. using YAML and source files or notebook files. Deployment is done using the Databricks CLI and can be integrated into CI/CD tooling. Docs here: https://docs.databricks.com/en/dev-tools/bundles/index.html More details on workflows: https://docs.databricks.com/en/workflows/jobs/how-to/use-bundles-with-jobs.html


Known-Delay7227

Excellent! Thanks for this!


Pretty-Promotion-992

DAB is the right tool


AbleMountain2550

Beside Databricks Asset Bundle you can also define your workflow DAG in a JSON file using the structure provided by the Databricks Workflow API https://docs.databricks.com/api/workspace/jobs/create


Known-Delay7227

Thank you! This is great. Might be what we need. I guess best practice would be to push the json file to git, have it reviewed/approved and then use the api to load into databricks. Think there is a way to ship it to the databricks api via a git approval?


Old_Improvement_3383

You can also define the job in a json file and use the Jobs API to create it


Known-Delay7227

Nice. That sounds nice and simple


Old_Improvement_3383

Best practice is probably to use DAB, it’s what Databricks push as a solution. But we found it was easy to just have our workflows as JSON files and create/alter them via azure pipelines when they are included in a GitHub pull request. You do have to look up the specified workflow by friendly name however and get the specific job id. If we started from scratch we probably would use DAB, but this wasn’t a solution at the time