home›Agentic Systems›

Duckle: The Local-First Desktop Data Pipeline Studio You Need

Discover how Duckle's visual builder, 290+ connectors, and local AI assistant streamline your data workflows, replacing heavy ETL and fragile spreadsheets.

May 28, 2026

#Agents #Automation #Dev Tools #LLM #Open Source

Explore Duckle, a local-first desktop data pipeline studio. Learn about its visual drag-and-drop builder, 290+ connectors, DuckDB integration, and a local AI assistant. Understand its offline capabilities, Git-ready workspaces, and how it simplifies ETL for single-machine workloads.

Overview

Duckle is a local‑first desktop data pipeline studio that replaces heavy cloud ETL and fragile spreadsheet workflows. It provides a visual drag‑and‑drop builder with 290+ connectors (files, databases, SaaS APIs, streaming, vector stores, and more). All pipelines compile to readable SQL executed by a columnar engine (DuckDB). A built‑in AI assistant (Duckie) runs entirely locally—describe a pipeline in plain English and the assistant drops the corresponding graph onto the canvas. The self‑contained binary (~30 MB) downloads engines on first launch. Workspaces are plain text files, diffable and Git‑ready. Duckle operates fully offline, supports 60 UI languages, and is MIT/Apache‑2.0 licensed. It targets single‑machine workloads; for larger data, outputs can be directed to warehouses or data lakes.

Installation and First Launch

OS	Asset	How to run
Windows	`Duckle-windows-x64.exe`	Double‑click; bypass SmartScreen with More info → Run anyway.
macOS (Apple Silicon)	`Duckle-macos-arm64`	`chmod +x` then run; first time right‑click → Open to bypass Gatekeeper.
Linux (x86_64)	`Duckle-linux-x64`	`chmod +x` then run; requires `libwebkit2gtk-4.1-0`.

On first launch, a setup modal guides you to install the DuckDB CLI (required) and optionally the Duckie AI model (~1.1 GB). Choose a workspace folder where all pipelines, connections, and contexts are stored as plain files. A 60‑second quickstart: drag a CSV source, wire a filter, add a Parquet sink, and press Run. Alternatively, click the sparkles icon, type a description, and insert the generated graph.

Building Pipelines

The everyday workflow: add sources, chain transforms (filter, join, aggregate, AI enrichment, cleaning), insert validators (not‑null, uniqueness, regex) that route failures to a reject port, and finish with sinks (files, databases, object storage, email). Run the graph; the Output panel shows row counts and timing. The AI assistant streams a graph from a natural‑language description—edit nodes afterwards. Reuse encrypted connections and context variables (${var}) for environment switching. Example recipes include CSV cleanup, Postgres‑to‑Snowflake nightly loads, RAG ingestion, and Slack digest pipelines. Ready‑to‑use samples live in the samples/ directory.

Workspace, Git, and Scheduling

A workspace folder contains pipelines/, connections/, contexts/, routines/, documents/, schedules.json, and run-history/. Everything is plain text, ready for git diff. The top‑bar Git icon opens a panel for status, staging, commit, push/pull, and branch management. Push/pull use your system credential helper; on a 401, Duckle prompts for a personal access token (AES‑encrypted in the workspace). Scheduling is configured in the Schedule panel with cron, interval, or file‑watch triggers. Schedules persist to schedules.json and run while Duckle is open. A headless CLI mode is planned for the 1.0 release.

Configuration Options

Setting	Where	Effect
Theme	Top‑bar sun/moon toggle	Light/dark, persisted to `localStorage`
Workspace	Top‑bar workspace pill → Switch	Change active workspace folder
Active engine	Top‑bar engine selector	Choose DuckDB (default) or SlothDB
Active context	Top‑bar context dropdown	Resolve context variables at run time
AI base URL	`baseUrl` prop on AI nodes	Point at any OpenAI‑compatible endpoint (default: local Duckie)
Per‑stage retry	Properties → Advanced tab	Number of attempts and linear backoff
Per‑stage memory cap	Properties → Advanced tab	Applies `PRAGMA memory_limit` to that stage
DuckDB extensions	Pre‑fetched at install; `spatial` lazy‑loaded	Avoids network pauses mid‑pipeline
`RUST_LOG`	Environment variable before launch	Set to `debug` for verbose engine logs
`DUCKLE_DUCKDB_BIN`	Environment variable for tests	Points integration tests at a specific DuckDB CLI

Constraints, Best Practices, and Key Procedures

Constraints: single‑machine scope; no headless CLI yet; AI model (1.5B Qwen) may need iteration for complex graphs; some connectors are Preview/Planned; no real‑time collaboration; unsigned binaries require bypass; Linux needs libwebkit2gtk; engine downloads are large.

Best practices: use Parquet for intermediates, push filters early, leverage built‑in vector/full‑text search, prefetch lazy extensions, batch AI calls, cap memory on heavy aggregates, use checkpoints, disable debug logging, sort once at the end, and clean data before AI enrichment.

Procedures: engines install to app‑data (%APPDATA%\io.duckle.app\engines\ on Windows, etc.). Building from source requires --features custom-protocol. To release, bump tauri.conf.json, commit, tag, and push. Keep .duckle/keys/ out of Git. For connectivity, adjust SSL mode or pre‑install extensions with duckdb :memory: -c "INSTALL spatial; LOAD spatial;".

git clone https://github.com/SouravRoy-ETL/duckle
cd duckle
npm --prefix frontend install

# Development (hot‑reload)
cargo tauri dev

# Release build (must include --features custom-protocol)
cargo build --release --manifest-path apps/desktop/Cargo.toml --features custom-protocol