Evidence
In the past year , databricks
, delta
, dashboard
, duckdb
, wasm
, polars
, iceberg
, flink
, rust
… one thing led to another and I ended up landing on evidence.dev
. The documentation is clean and to the point. But requires some effort to get started with other data sources. It took me a few hours to get the data from databricks
because I missed some of the environment variables settings for databrics. Getting duckdb
to work is easy. Otherwise, it was a breeze to get started with evidence.dev
.
Installation
Refer to the installation guide for more details.
Customisation
Add a file page/+layout.svelte
to customise the layout of the page. There is further documentation on the layout page.
A basic layout file for the common tasks
- Remove queries
- Full width
- Title
+layout.svelte
<EvidenceDefaultLayout
{data}
title="Vish"
fullWidth={true}
neverShowQueries={true}
builtWithEvidence={true}
>
<slot slot="content" />
</EvidenceDefaultLayout>
Databricks
Databricks SQL Warehouse is a cloud-based data warehousing solution from Databricks. It is a fully managed, serverless, and highly scalable data warehouse that allows you to query and analyze data using standard SQL. Evidence can use Databricks as a data source.
Add Databricks as a data source
Add a folder sources/databricks_admin_workspace
and create a file pages/databricks_admin_workspace/connection.yaml
with the following content.
I have used one source for workspace as there might be RBAC per workspace and Unity Catalog. This page uses the System Tables in databricks to visualise the usage, cost and other metrics.
Other sources can be added for different workspaces that could be used to create BI dashboards for different teams.
sources/databricks_admin_workspace/connection.yaml
name: databricks_admin_workspace
type: databricks
Environment Variables
I did spend some time to get the databricks to work. It was my mistake to not read the documentation properly. The environment variables are required to be set for databricks.
.env / .envrc
export EVIDENCE_SOURCE__databricks_admin_workspace__host="adb-<workspace-id>.azuredatabricks.net"
export EVIDENCE_SOURCE__databricks_admin_workspace__port="443"
export EVIDENCE_SOURCE__databricks_admin_workspace__path="/sql/1.0/warehouses/<warehouse_id>"
export EVIDENCE_SOURCE__databricks_admin_workspace__token="<personal_access_token>"
Source Query
Refer to the Source Queries for more details. To use the system.billing.usage
table in databricks, the following query can be used. The source queries only select the columns that are required for the visualisation. It is possible to have aggregations and other transformations in the source query. The transformations can be SQL in duckdb queries or called SQL Queries
sources/databricks_admin_workspace/billing_usage.sql
select usage_date as dt, usage_start_time as start, workspace_id as id,
sku_name, usage_quantity
from system.billing.usage
BI as Code
The main USP of evidence.dev is the ability to write SQL queries and visualise the data. What drew me most towards evidence is the ability to use Markdown
and SQL
to create static pages ( with dynamic behaviour) for BI dashboards. Some of the advantages
- Only requires knowledge of SQL and Markdown
- Unit tests can be written for the SQL queries.
- CI/CD and BI as Code
- Git Ops
- Version Control
- Audit
- Reproducibility
SQL Queries
Refer to docs.evidence.dev. To create data for a bar chart, we must create a sql query refers to the tables
created using the source queries
. We can use databricks_admin_workspace.billing_usage
as a table.
1.Add a file queries/daily_usage_today.sql
queries/daily_usage_today.sql
select date_trunc('hour',start) as dt, id, sum(usage_quantity) as cost
from databricks_admin_workspace.billing_usage where dt = current_date
group by all
2.Add the query in the frontmatter of the page
pages/index.md
---
title: Cost Monitoring
queries:
- daily_usage_today.sql
---
Visualisation
With the source queries and sql queries in place, we can now visualise the data using evidence ui components
<BarChart
data={daily_usage_today}
title='Daily Usage Today'
x=dt
y=cost
series=name
type=grouped
/>
And that's it. We have a dashboard for databricks.
CI / CD
There are multiple ways to deploy which depends on the infrastructure. The basics are
- Build using
npm run sources && npm run build
- Package the static files generated in the folder
build
folder. - Deploy the static files to the server.
- Update as required as evidence is a static site generator.
Source Code
The source code for PoC Project can be found here