# Databricks

Databricks is a unified data lakehouse platform for analytics and AI. Connecting Databricks to Duvo lets your assignments run SQL queries against your SQL Warehouse, explore your data catalog, and pull results into automated workflows.

## Setup

### Prerequisites

* A Databricks workspace with at least one running SQL Warehouse.
* A Databricks personal access token (PAT). To generate one, go to **User Settings > Developer > Access tokens** in your Databricks workspace and click **Generate new token**. Copy the token immediately — it cannot be retrieved later.
* The connection details for your SQL Warehouse. To find them, open your Databricks workspace, click **SQL Warehouses** in the sidebar, select your warehouse, and open the **Connection details** tab.

### Required Permissions

* **Databricks SQL entitlement** — Your user account must have the Databricks SQL entitlement enabled in the workspace.
* **CAN USE** permission on the target SQL Warehouse — This is the minimum access level required to connect and run queries.
* **Data access** — Your account must have SELECT (or higher) privileges on the catalogs, schemas, and tables you want to query.

### Connection Fields

| Field                     | Description                                                                                                                                            |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Server Hostname**       | Your Databricks workspace hostname (e.g., `dbc-abc123.cloud.databricks.com`). Found on the **Connection details** tab of your SQL Warehouse.           |
| **HTTP Path**             | The SQL Warehouse HTTP path (e.g., `/sql/1.0/warehouses/abc123`). Found on the **Connection details** tab of your SQL Warehouse.                       |
| **Personal Access Token** | Your Databricks personal access token (starts with `dapi`). Generated from **User Settings > Developer > Access tokens** in your Databricks workspace. |

### Third-Party Documentation

* [Generate a personal access token](https://docs.databricks.com/aws/en/dev-tools/auth/pat) — Step-by-step guide for creating PATs.
* [Get connection details for a compute resource](https://docs.databricks.com/aws/en/integrations/compute-details) — How to find your Server Hostname and HTTP Path.
* [SQL warehouse access control](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html) — Managing CAN USE and other warehouse permissions.

## Capabilities

* **Run SQL queries** — Execute analytical queries directly against your Databricks SQL Warehouse and retrieve results.
* **Explore your data catalog** — Browse catalogs, schemas, and tables using discovery queries to understand available datasets.
* **Inspect table structures** — View column definitions, data types, and metadata for any table in your lakehouse.
* **Pull business metrics** — Extract KPIs, performance data, and aggregated insights for reports and downstream workflows.

## Key Benefits

* **Lakehouse access** — Query your unified data platform directly from assignments without manual SQL sessions.
* **Familiar SQL interface** — Use standard SQL to retrieve exactly the data you need from structured and semi-structured sources.
* **Secure authentication** — Connects through personal access tokens with scoped permissions, keeping your data access controlled.
* **Large result handling** — Streams query results efficiently using Arrow format, supporting datasets with millions of rows.
* **Data-driven automation** — Make intelligent workflow decisions based on live lakehouse data rather than stale exports.

## Works Well With

* **Google Sheets or Excel** — Query Databricks for raw data, then push summarized results into a spreadsheet for stakeholder review.
* **Slack or Email** — Run scheduled analytical queries and deliver key metrics or alerts directly to your team's communication channels.
* **Snowflake or BigQuery** — Combine data from multiple warehouse platforms in a single assignment to build cross-system reports.
