Tailored news hub
home›Agentic Systems›

Duckle: The Local-First Desktop Data Pipeline Studio You Need

Discover how Duckle's visual builder, 290+ connectors, and local AI assistant streamline your data workflows, replacing heavy ETL and fragile spreadsheets.

Duckle: The Local-First Desktop Data Pipeline Studio You Need
#Agents#Automation#Dev Tools#LLM#Open Source

Explore Duckle, a local-first desktop data pipeline studio. Learn about its visual drag-and-drop builder, 290+ connectors, DuckDB integration, and a local AI assistant. Understand its offline capabilities, Git-ready workspaces, and how it simplifies ETL for single-machine workloads.

Overview

Duckle is a local‑first desktop data pipeline studio that replaces heavy cloud ETL and fragile spreadsheet workflows. It provides a visual drag‑and‑drop builder with 290+ connectors (files, databases, SaaS APIs, streaming, vector stores, and more). All pipelines compile to readable SQL executed by a columnar engine (DuckDB). A built‑in AI assistant (Duckie) runs entirely locally—describe a pipeline in plain English and the assistant drops the corresponding graph onto the canvas. The self‑contained binary (~30 MB) downloads engines on first launch. Workspaces are plain text files, diffable and Git‑ready. Duckle operates fully offline, supports 60 UI languages, and is MIT/Apache‑2.0 licensed. It targets single‑machine workloads; for larger data, outputs can be directed to warehouses or data lakes.

Installation and First Launch

OSAssetHow to run
WindowsDuckle-windows-x64.exeDouble‑click; bypass SmartScreen with More info → Run anyway.
macOS (Apple Silicon)Duckle-macos-arm64chmod +x then run; first time right‑click → Open to bypass Gatekeeper.
Linux (x86_64)Duckle-linux-x64chmod +x then run; requires libwebkit2gtk-4.1-0.

On first launch, a setup modal guides you to install the DuckDB CLI (required) and optionally the Duckie AI model (~1.1 GB). Choose a workspace folder where all pipelines, connections, and contexts are stored as plain files. A 60‑second quickstart: drag a CSV source, wire a filter, add a Parquet sink, and press Run. Alternatively, click the sparkles icon, type a description, and insert the generated graph.

Building Pipelines

The everyday workflow: add sources, chain transforms (filter, join, aggregate, AI enrichment, cleaning), insert validators (not‑null, uniqueness, regex) that route failures to a reject port, and finish with sinks (files, databases, object storage, email). Run the graph; the Output panel shows row counts and timing. The AI assistant streams a graph from a natural‑language description—edit nodes afterwards. Reuse encrypted connections and context variables (${var}) for environment switching. Example recipes include CSV cleanup, Postgres‑to‑Snowflake nightly loads, RAG ingestion, and Slack digest pipelines. Ready‑to‑use samples live in the samples/ directory.

Workspace, Git, and Scheduling

A workspace folder contains pipelines/, connections/, contexts/, routines/, documents/, schedules.json, and run-history/. Everything is plain text, ready for git diff. The top‑bar Git icon opens a panel for status, staging, commit, push/pull, and branch management. Push/pull use your system credential helper; on a 401, Duckle prompts for a personal access token (AES‑encrypted in the workspace). Scheduling is configured in the Schedule panel with cron, interval, or file‑watch triggers. Schedules persist to schedules.json and run while Duckle is open. A headless CLI mode is planned for the 1.0 release.

Configuration Options

SettingWhereEffect
ThemeTop‑bar sun/moon toggleLight/dark, persisted to localStorage
WorkspaceTop‑bar workspace pill → SwitchChange active workspace folder
Active engineTop‑bar engine selectorChoose DuckDB (default) or SlothDB
Active contextTop‑bar context dropdownResolve context variables at run time
AI base URLbaseUrl prop on AI nodesPoint at any OpenAI‑compatible endpoint (default: local Duckie)
Per‑stage retryProperties → Advanced tabNumber of attempts and linear backoff
Per‑stage memory capProperties → Advanced tabApplies PRAGMA memory_limit to that stage
DuckDB extensionsPre‑fetched at install; spatial lazy‑loadedAvoids network pauses mid‑pipeline
RUST_LOGEnvironment variable before launchSet to debug for verbose engine logs
DUCKLE_DUCKDB_BINEnvironment variable for testsPoints integration tests at a specific DuckDB CLI

Constraints, Best Practices, and Key Procedures

Constraints: single‑machine scope; no headless CLI yet; AI model (1.5B Qwen) may need iteration for complex graphs; some connectors are Preview/Planned; no real‑time collaboration; unsigned binaries require bypass; Linux needs libwebkit2gtk; engine downloads are large.

Best practices: use Parquet for intermediates, push filters early, leverage built‑in vector/full‑text search, prefetch lazy extensions, batch AI calls, cap memory on heavy aggregates, use checkpoints, disable debug logging, sort once at the end, and clean data before AI enrichment.

Procedures: engines install to app‑data (%APPDATA%\io.duckle.app\engines\ on Windows, etc.). Building from source requires --features custom-protocol. To release, bump tauri.conf.json, commit, tag, and push. Keep .duckle/keys/ out of Git. For connectivity, adjust SSL mode or pre‑install extensions with duckdb :memory: -c "INSTALL spatial; LOAD spatial;".

git clone https://github.com/SouravRoy-ETL/duckle
cd duckle
npm --prefix frontend install

# Development (hot‑reload)
cargo tauri dev

# Release build (must include --features custom-protocol)
cargo build --release --manifest-path apps/desktop/Cargo.toml --features custom-protocol
Related Articles