We’re excited to announce that Microsoft Purview now has lineage tracking for Azure Databricks Unity Catalog. This feature enhances data governance by allowing users to track data flow across their Azure Databricks notebooks. With data moving through cloud platforms like Azure Databricks, having visibility is important for compliance and operational excellence.
Data lineage involves tracking the origins, movements, and transformations of data as it moves through systems and processes. It helps answer questions like where the data comes from, how it is used, and who has modified it. In Azure Databricks Unity Catalog, lineage shows how data flows through notebooks, helping users understand data lifecycle and ensure compliance with governance policies.
Microsoft Purview can capture lineage at the table/view level and the column level in Unity Catalog. To enable lineage tracking, users need to enable the system.access schema in Unity Catalog and ensure the scanning account has SELECT privileges on specific system tables.
To fetch lineage during scans in Microsoft Purview, users need to enable Lineage Extraction when setting up the scan for Azure Databricks. After running a scan, data from Azure Databricks Unity Catalog will appear in the Microsoft Purview Data Map, providing a unified view of data sources and transformations.
Visual comparisons between Azure Databricks