Skip to content
CFOCoder

Running Local Notebooks on Databricks Using VS Code

Execute Jupyter notebooks locally in VS Code while leveraging Databricks compute infrastructure. Your code runs on Databricks’ serverless compute, but you edit and manage files locally—giving you the best of both worlds: local development experience with cloud compute power.

Data Science 5 min read
databricks vscode
databricks vscode

Execute Jupyter notebooks locally in VS Code while leveraging Databricks compute infrastructure. Your code runs on Databricks’ serverless compute, but you edit and manage files locally—giving you the best of both worlds: local development experience with cloud compute power.

  • Open VS Code Extensions marketplace (Ctrl+Shift+X / Cmd+Shift+X)

  • Search for “Databricks”

  • Install the official Databricks extension by Databricks

  • After installation, authenticate with your Databricks workspace

⚠️ Important: When installing the Databricks extension, let it create and manage its own virtual environment. Manually creating your own environment can trigger compatibility warnings.

Once the extension is installed:

  • Activate Serverless Compute (or select your cluster)

In Databricks Free Edition, “Serverless” is the default option

  • You should see green icons indicating active connection

  • Enable Remote Folder Sync (optional but recommended)

This opens a “sync” terminal session

  • Keeps files synchronized between local and Databricks workspace

  • Keep this terminal open for continuous sync

When you click on authenticate, you’ll be required to create a profile and input the values of the host and token, this saves your configuration to ~/.databrickscfg:

[DEFAULT]
host = https://dbc-xxxxx.cloud.databricks.com
token = dapi...

Activate the virtual environment created by Databricks extension, then install:

Terminal window
# Activate the Databricks venv (path shown in VS Code)
source /path/to/databricks-venv/bin/activate
# Install required packages
pip install databricks-connect==17.2.0

Note: Use databricks-connect version that matches your Databricks runtime. For most cases, version 17.2.0 works well.

Run the configuration command:

databricks configure --token

You’ll be prompted for:

  • Go to your Databricks workspace

  • Click your profile → Settings → Developer → Personal access tokens

  • Click Generate new token

  • Copy and save the token securely

  • Paste it when prompted by the CLI

In your notebook, create a Spark session using Databricks Connect:

from databricks.connect import DatabricksSession
# Read configuration from ~/.databrickscfg
spark = (DatabricksSession.builder
.profile("DEFAULT") # Uses config from ~/.databrickscfg
.serverless(True) # Use serverless compute
.getOrCreate())
print(f"✓ Connected! Spark version: {spark.version}")

If you prefer to specify credentials explicitly:

from databricks.connect import DatabricksSession
spark = (DatabricksSession.builder
.host("https://dbc-xxxxx.cloud.databricks.com")
.token("dapi...")
.serverless(True)
.getOrCreate())

⚠️ Security Note: Don’t hardcode tokens in notebooks you plan to share or commit to git. Use .profile("DEFAULT") instead.

Test your connection:

# Check Spark version
import pyspark
print(f"PySpark Version: {pyspark.__version__}")
# Test reading a table (if you have one)
df = spark.table("catalog.schema.table_name")
df.show(5)

  • Local: VS Code editor, notebook files live on your machine

  • Remote: Code execution happens on Databricks serverless compute

  • Sync: The extension automatically syncs your notebook to Databricks when you execute cells

  • Results: Output appears directly in VS Code

┌─────────────────┐
│ VS Code │ ← You edit here
│ (Local) │
└────────┬────────┘
│ Databricks Connect
┌─────────────────┐
│ Databricks │ ← Code runs here
│ Serverless │
└─────────────────┘

  • Open your .ipynb file in VS Code

  • Select the Databricks Python kernel

  • Click “Run All” or execute individual cells

  • Code runs on Databricks serverless compute

  • Results display in VS Code

# Unity Catalog tables
df = spark.table("catalog.schema.table_name")
# Hive metastore tables
df = spark.table("workspace.default.table_name")
# Display results
df.show(10)

The .bundle folder is created automatically by Databricks Asset Bundle:

  • What it is: Synchronized version of your project in Databricks

  • Where it syncs: /Workspace/Users/your-email/.bundle/project-name/dev/

  • Auto-sync: Updates automatically when you execute cells

  • Purpose: Version control and deployment management

You can safely ignore it. To keep it out of Git:

Terminal window
echo ".bundle/" >> .gitignore

The .bundle folder is the remote copy—your local notebook files are the source of truth.

Problemmetadata-service: Connection refused

Solution: Clear environment variables that force metadata service authentication:

import os
# Remove problematic environment variables
for var in ['DATABRICKS_AUTH_TYPE', 'DATABRICKS_METADATA_SERVICE_URL', 'DATABRICKS_SERVERLESS_COMPUTE_ID']:
os.environ.pop(var, None)
# Then create session
spark = DatabricksSession.builder.profile("DEFAULT").serverless(True).getOrCreate()

ProblemValueError: default auth: ...

Solution:

  • Verify token and hostname in ~/.databrickscfg

  • Check token hasn’t expired

  • Regenerate token if needed

  • Run databricks configure —token again

ProblemModuleNotFoundError: databricks.connect

Solution:

Terminal window
pip install databricks-connect==17.2.0

Make sure version matches your Databricks runtime.

Problem: Cell execution hangs indefinitely

Solution:

  • Check that serverless compute is enabled in your workspace

  • Verify you have available compute quota

  • Check Databricks workspace for job status

  • Try restarting the kernel

ProblemNameError: name 'spark' is not defined

Solution: Make sure you run the initialization cell first:

from databricks.connect import DatabricksSession
spark = (DatabricksSession.builder
.profile("DEFAULT")
.serverless(True)
.getOrCreate())

  • ✅ Use .profile(“DEFAULT”) to read from ~/.databrickscfg

  • ✅ Never commit tokens to git

  • ✅ Add ~/.databrickscfg to .gitignore if sharing configs

  • ✅ Rotate tokens periodically

  • ❌ Don’t hardcode tokens in notebooks

  • Local Development: Edit notebooks in VS Code

  • Test Locally: Run cells and verify output

  • Version Control: Commit to git (excluding .bundle/)

  • Production: Deploy using Databricks Asset Bundles or workspace sync

  • Use spark.sql() for complex queries instead of DataFrame operations

  • Cache intermediate results: df.cache()

  • Use display() for better visualization in Databricks

  • Monitor compute usage in Databricks workspace

your-project/
├── .bundle/ # Auto-generated, ignore in git
├── .gitignore # Add .bundle/ here
├── databricks.yml # Databricks Asset Bundle config
├── notebooks/
│ ├── analysis.ipynb
│ └── etl.ipynb
├── src/
│ └── utils.py
└── README.md

  • 📁 Your local notebook file is the source of truth

  • 🔄 Databricks syncs automatically when you execute cells

  • 📊 Use display() for enhanced Databricks visualizations

  • 🌐 Check Databricks workspace to see synced notebooks

  • 💰 Free Edition includes generous compute allocation

  • 🗂️ Don’t delete .bundle/—it’s managed by Databricks

  • 🔍 View compute logs in Databricks workspace for debugging

  • ⌨️ Use VS Code keyboard shortcuts for faster development

Terminal window
# Configure Databricks CLI
databricks configure --token
# List workspaces
databricks workspace list /
# Check current profile
databricks auth env --profile DEFAULT
# Install packages
pip install databricks-connect==17.2.0

# Cell 1: Initialize Spark
from databricks.connect import DatabricksSession
spark = (DatabricksSession.builder
.profile("DEFAULT")
.serverless(True)
.getOrCreate())
print(f"✓ Spark {spark.version} ready")
# Cell 2: Your analysis
df = spark.table("catalog.schema.table")
df.show()

  • Databricks Connect Documentation

  • VS Code Extension Guide

  • Databricks Asset Bundles

  • Databricks CLI Reference

With this setup, you get:

  • 🚀 Local development with VS Code’s powerful editor

  • ☁️ Cloud compute without local resource constraints

  • 🔄 Automatic sync between local and Databricks

  • 📊 Full Databricks features (Unity Catalog, Delta Lake, etc.)

  • 💻 Familiar workflow like regular Jupyter notebooks

I found this video tutorial very helpful and very well explained about the general process of installing the Databricks extension in VS Code, although some of the details differ from my experience above, but even with that, it gives a good idea about how to install and use the extension.

That’s it. Happy coding with Databricks from VS Code! 🚀