Overview
Duckle is a local‑first desktop data pipeline studio that replaces heavy cloud ETL and fragile spreadsheet workflows. It provides a visual drag‑and‑drop builder with 290+ connectors (files, databases, SaaS APIs, streaming, vector stores, and more). All pipelines compile to readable SQL executed by a columnar engine (DuckDB). A built‑in AI assistant (Duckie) runs entirely locally—describe a pipeline in plain English and the assistant drops the corresponding graph onto the canvas. The self‑contained binary (~30 MB) downloads engines on first launch. Workspaces are plain text files, diffable and Git‑ready. Duckle operates fully offline, supports 60 UI languages, and is MIT/Apache‑2.0 licensed. It targets single‑machine workloads; for larger data, outputs can be directed to warehouses or data lakes.
Installation and First Launch
| OS | Asset | How to run |
|---|---|---|
| Windows | Duckle-windows-x64.exe | Double‑click; bypass SmartScreen with More info → Run anyway. |
| macOS (Apple Silicon) | Duckle-macos-arm64 | chmod +x then run; first time right‑click → Open to bypass Gatekeeper. |
| Linux (x86_64) | Duckle-linux-x64 | chmod +x then run; requires libwebkit2gtk-4.1-0. |
On first launch, a setup modal guides you to install the DuckDB CLI (required) and optionally the Duckie AI model (~1.1 GB). Choose a workspace folder where all pipelines, connections, and contexts are stored as plain files. A 60‑second quickstart: drag a CSV source, wire a filter, add a Parquet sink, and press Run. Alternatively, click the sparkles icon, type a description, and insert the generated graph.
Building Pipelines
The everyday workflow: add sources, chain transforms (filter, join, aggregate, AI enrichment, cleaning), insert validators (not‑null, uniqueness, regex) that route failures to a reject port, and finish with sinks (files, databases, object storage, email).
Run the graph; the Output panel shows row counts and timing.
The AI assistant streams a graph from a natural‑language description—edit nodes afterwards.
Reuse encrypted connections and context variables (${var}) for environment switching.
Example recipes include CSV cleanup, Postgres‑to‑Snowflake nightly loads, RAG ingestion, and Slack digest pipelines.
Ready‑to‑use samples live in the samples/ directory.
Workspace, Git, and Scheduling
A workspace folder contains pipelines/, connections/, contexts/, routines/, documents/, schedules.json, and run-history/.
Everything is plain text, ready for git diff.
The top‑bar Git icon opens a panel for status, staging, commit, push/pull, and branch management.
Push/pull use your system credential helper; on a 401, Duckle prompts for a personal access token (AES‑encrypted in the workspace).
Scheduling is configured in the Schedule panel with cron, interval, or file‑watch triggers.
Schedules persist to schedules.json and run while Duckle is open.
A headless CLI mode is planned for the 1.0 release.
Configuration Options
| Setting | Where | Effect |
|---|---|---|
| Theme | Top‑bar sun/moon toggle | Light/dark, persisted to localStorage |
| Workspace | Top‑bar workspace pill → Switch | Change active workspace folder |
| Active engine | Top‑bar engine selector | Choose DuckDB (default) or SlothDB |
| Active context | Top‑bar context dropdown | Resolve context variables at run time |
| AI base URL | baseUrl prop on AI nodes | Point at any OpenAI‑compatible endpoint (default: local Duckie) |
| Per‑stage retry | Properties → Advanced tab | Number of attempts and linear backoff |
| Per‑stage memory cap | Properties → Advanced tab | Applies PRAGMA memory_limit to that stage |
| DuckDB extensions | Pre‑fetched at install; spatial lazy‑loaded | Avoids network pauses mid‑pipeline |
RUST_LOG | Environment variable before launch | Set to debug for verbose engine logs |
DUCKLE_DUCKDB_BIN | Environment variable for tests | Points integration tests at a specific DuckDB CLI |
Constraints, Best Practices, and Key Procedures
Constraints: single‑machine scope; no headless CLI yet; AI model (1.5B Qwen) may need iteration for complex graphs; some connectors are Preview/Planned; no real‑time collaboration; unsigned binaries require bypass; Linux needs libwebkit2gtk; engine downloads are large.
Best practices: use Parquet for intermediates, push filters early, leverage built‑in vector/full‑text search, prefetch lazy extensions, batch AI calls, cap memory on heavy aggregates, use checkpoints, disable debug logging, sort once at the end, and clean data before AI enrichment.
Procedures: engines install to app‑data (%APPDATA%\io.duckle.app\engines\ on Windows, etc.).
Building from source requires --features custom-protocol.
To release, bump tauri.conf.json, commit, tag, and push.
Keep .duckle/keys/ out of Git.
For connectivity, adjust SSL mode or pre‑install extensions with duckdb :memory: -c "INSTALL spatial; LOAD spatial;".
git clone https://github.com/SouravRoy-ETL/duckle cd duckle npm --prefix frontend install # Development (hot‑reload) cargo tauri dev # Release build (must include --features custom-protocol) cargo build --release --manifest-path apps/desktop/Cargo.toml --features custom-protocol





