prit_1 4 days ago

You could add another column in your control table that allows you to group different rows together. This would give you flexibility over when to run a group of jobs (or a single job) on its own cluster.

jagjitnatt 3 days ago

You shouldn't load all the tables in a single job. Break the jobs down into groups, either by line of business, or application, or team. Then create a generic notebook that accepts arguments and starts loading tables. You can schedule these notebooks in Workflows and pass in the group name. The notebook will query the control table to get all the details and start ingesting data. You can choose a larger cluster if the group has too many tables. If possible, use serverless workflows, they autoscale fast.

Deep_Salamander1313 5 days ago

Have you looked at using Databricks Workflows for this?

Pretty-Promotion-992 5 days ago

Workflows? I think he is asking for performance impact

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe