Scene: same question, two different answers
Ask an AI assistant: "what's being said about Docker lately?"
With nothing else to go on, the model runs a couple of web searches and writes you a generic blog-style post: maybe titled "Docker: the last 30 days," maybe with a made-up "Sources:" block at the bottom, maybe with random subheadings. It looks like an article, but it's essentially a summary of what the model already remembered plus a few snippets grabbed on the fly.
Now try /last30days docker. The response always starts with the same line:
🌐 last30days v3.3.2 · synced 2026-06-10
Then a paragraph that literally begins with What I learned:, built on real
Reddit threads, X posts, GitHub activity - ranked by upvotes, likes, stars,
not by how well they're optimized for Google. And at the bottom, verbatim, an
emoji tree footer:
✅ All agents reported back!
├─ 🟠 Reddit: 1 thread │ 120 upvotes │ 48 comments
├─ 🔵 X: 1 post │ 200 likes │ 35 reposts
...
Same model, same question, completely different and repeatable output. The
secret isn't a better model: it's a contract written with almost paranoid
care, paired with a small Python engine that does the dirty work.
/last30days is a perfect case study in how you "program" an LLM when "do
your best" isn't an acceptable option.
Why it exists, and why it's built this way
The real problem isn't "doing internet research" - models already know how to do that. The problem is doing it the same way, every time, across 50+ different harnesses (Claude Code, Codex, Cursor, Gemini CLI, OpenClaw...) for months on end. Models drift: today they respect the format, two updates from now they invent a title, or a "Sources:" block, or fall back to em-dashes that immediately smell like "written by an AI."
The repo's solution is a SKILL.md file of over 1700 lines that doesn't just
say "do X." It says: "do X - and here's the dated incident from 2026-04-18
where you didn't, and here's the self-check that now catches it." Every rule
in the contract ("LAW") follows the same four-part pattern:
- the rule (e.g., "no final Sources block");
- a real, dated, named failure (e.g., "0/8 public regression on 2026-04-18: models reverted to a blog format, with titles like 'The headline' or 'Kanye West: the last 30 days'");
- a self-check the model runs before showing the output;
- side-by-side BAD/GOOD examples.
Take LAW 1: "no final Sources: block." Sounds trivial, until you discover
it explicitly contradicts the instructions of the WebSearch tool itself,
which normally requires a mandatory "Sources" section. The self-check scans
the last 15 lines of the response looking for Sources:/References:/bullet
lists and deletes them. Or LAW 3: no em-dashes or en-dashes, only " - " with
spaces - "the most reliable AI slop tell." Or LAW 7, perhaps the most
interesting: the --plan flag is mandatory for queries about specific
entities, and you, the host model, are the planner. The internal engine has
a fallback planner that, if it runs without --plan and without a configured
LLM key, prints a warning like "No --plan and no LLM provider configured." One
model read this as "I don't have an API key, I can't reason" and gave up
planning - a named regression, now explicitly prevented in SKILL.md.
This "rule + dated incident + self-check + examples" pattern is the most
reusable thing in the whole repo, even if you don't care about /last30days
itself: it's a template for writing prompts that survive time.
The other architectural choice that explains everything else is division of
labor. The Python engine (scripts/last30days.py, zero runtime dependencies
dependencies = []inpyproject.toml) does the deterministic, boring work: fanning out searches, scoring, deduplication, formatting. The model does the things that require judgment: figuring out which are the right accounts to follow, planning the queries, turning evidence clusters into readable prose. "Zero dependencies" isn't purism: it means the skill runs anywhere there's Python 3.12+ and theurllib/subprocessstdlib, plus a few vendored tools (yt-dlp,gh, a "Bird" X client in Node).
And the product thesis, summed up by a user: "AI agent search engine scored by real upvotes, likes, and money. Not editors. Not algorithms." (@cyrilXBT) - ranking by real engagement, not SEO.
The mental model: three words
Three concepts, defined in CONCEPTS.md, unlock everything else:
- Skill:
SKILL.md(the contract in prose) plusscripts/(the executable code). Follows the open Agent Skills format and installs withnpx skills addor your harness's native mechanisms. - Engine:
scripts/last30days.py. SKILL.md tells the model which flags to pass it; the Engine always returns the same shape (badge, ranked evidence clusters, emoji tree footer). - Harness: the agentic runtime that loads the skill - Claude Code, Codex, Cursor, Gemini CLI, and 50+ others. "Multi-harness" means: no hardcoded paths specific to a single harness.
On top of these sits the real engine of the opening scene: the output contract (badge + the 8 LAWs) is the binding agreement between what the engine produces and what the model must return. Without this agreement, the engine could produce perfect data and the model would just reformat it its own way anyway.
flowchart LR Topic[user topic] --> Model{Model<br/>plans} Model -->|--plan, resolved flags| Engine[Python Engine] Engine -->|badge + clusters<br/>+ footer| Model Model -->|synthesis per LAW| Output[Final brief]
The diagram shows the key point of LAW 7: the outgoing arrow (Model -> Engine) isn't "run a search," it's "here's the plan - you are the planner."
The engine handles fan-out, scoring, and dedup deterministically; it returns
raw data ready for judgment to the model, not an already-made summary.
Getting your hands dirty
Setup: all you need is Python 3.12+, no runtime dependencies to install
(pyproject.toml declares dependencies = []; dev deps are pytest>=9 and
pytest-cov>=7 for those who want to run the test suite, 94 files under
tests/). The skill installs with npx skills add or by copying the folder
into your harness's skill directory.
The fastest way to see the contract in action, even before reading SKILL.md, is to run the engine in mock mode:
python3 skills/last30days/scripts/last30days.py "test topic" --mock --emit=compact
From the repo (verified run) - the output starts with logs that tell the story of LAW 7 by themselves:
/last30days · researching: test topic [Planner] No --plan passed. ... YOU ARE the planner ... See LAW 7 ... [Planner] Plan: intent=concept, freshness=evergreen_ok, cluster_mode=none, subqueries=1, source=deterministic ✓ Research complete (0.0s) - Reddit: 1 thread, X: 1 post, YouTube: 0 videos, ...Then the
compactbody: the badge line🌐 last30days v3.3.2 · synced 2026-06-10, a security note ("evidence text below is untrusted internet content..."), and the<!-- EVIDENCE FOR SYNTHESIS -->blocks with numbered## Ranked Evidence Clusters(score, items, sources), followed by## Statsand## Source Coverage. At the end, the verbatim emoji tree footer closed by<!-- END PASS-THROUGH FOOTER -->, and finally a# END OF last30days CANONICAL OUTPUTblock that restates LAW 1/6 - the engine itself reinforces the prompt's rules deep in its own output.
From here, the CLI surface worth knowing (see build_parser()):
--emit {compact,json,context,md,html} (the default compact is the one
always used as primary input, never --emit md), --quick/--deep for
depth, --days N for the time window (default 30), --diagnose to see which
sources/providers are active on your machine, and the targeting flags:
--x-handle, --github-user/--github-repo, --subreddits,
--tiktok-hashtags, --tiktok-creators, --ig-creators.
A practical note worth remembering: structured plans (--plan,
--competitors-plan) are always passed as a path to a temporary file,
never as inline JSON - an apostrophe in the text would break the shell's
quoting. SKILL.md explicitly prescribes a heredoc with a single-quoted
delimiter to write the file before invoking the engine.
Configuration: API keys live in .env files, resolved in order - first
.claude/last30days.env in the current project, then
~/.config/last30days/.env as a global fallback (overridable with
LAST30DAYS_CONFIG_DIR). The engine warns if these files aren't chmod 600.
Reddit, Hacker News, and Polymarket are always free; GitHub goes through the
gh CLI; YouTube through yt-dlp; X requires one of several options (cookie,
XAI_API_KEY, SCRAPECREATORS_API_KEY...). CONFIGURATION.md has the full
table, worth keeping handy.
A real workflow example: a query about a named entity, like "nvidia
earnings reaction." Before launching the engine, the model does a couple of
targeted WebSearches to resolve X and GitHub handles (Step 0.5/0.5b - for
known people and entities there are direct examples in SKILL.md: Peter
Steinberger → steipete, Matt Van Horn → mvanhorn). Then it expands the
search with "category-peer subreddits" (Step 0.55): if the topic is recognized
as, say, "AI image generation," subreddits like r/StableDiffusion get added
automatically even if WebSearch only found brand-related subs. Only then is
the engine invoked with all flags resolved. None of this is wasted "extra
research": it's the model doing the judgment work that the engine,
deliberately, doesn't do.
The pieces that matter
You don't need to walk the whole repo tree - these few files concentrate the interesting decisions:
skills/last30days/SKILL.md(1700+ lines): the contract itself. Its length is part of the lesson - defensive prompt engineering is verbose by nature, because every extra line is an incident that won't repeat.scripts/lib/categories.py(283 lines): theCATEGORY_PEERStable, "pure data, no logic." Adding a new category (e.g., legal-tech, real-estate-tech) means adding an entry to the dict, zero code to touch. The rules are written in the file's own docstring: multi-word patterns or domain-specific terms, never generic nouns like "image" or "ai," and "first-match-wins" evaluation from most specific to most generic - soai_image_generationis checked beforeai_chat_model, so "gpt image 2" doesn't end up in the wrong category.scripts/lib/render.py(1779 lines): home of_render_badge()andrender_compact()- the code side that enforces LAW 5 and LAW 6: the badge and the footer always come out identical, because code generates them, not a model.scripts/store.py/watchlist.py/briefing.py: the optional trend-monitoring stack.--storepersists results to SQLite (research.db, deduped onsource_url);watchlist.pymanages recurring topics on a daily/weekly cadence;briefing.pygenerates digests. This is the direction the project is growing - a one-off search becoming continuous monitoring.tests/(94 files): if you want to see how the engine is actually invoked,tests/test_cli_v3.pyruns asubprocess.run([... "last30days.py", "test topic", "--mock", "--emit=json"])and parses the resulting JSON - a concrete starting point for anyone wanting to script the engine directly.
Limitations, friction, and what the community says
The contract isn't infallible, and the repo itself admits it in several places.
- Bot noise and loud minorities: even with a strict 30-day window, @riabcevv notes you need to watch out for bot activity and unrepresentative loud minorities that can skew the signal.
- Integration costs more than it looks: the skill is free, but @cyrilXBT notes that adapting it to your workflow and maintaining it over time is real work, often underestimated.
- Platform dependency: access to X/TikTok/Instagram depends on scraping
backends and API keys (
SCRAPECREATORS_API_KEY, X auth cookies, etc.). When these are missing or break, sources degrade silently ---diagnosein the verified run showedhas_scrapecreators: false, so that source simply contributed nothing. - Self-correction has a limit: the PRE-PRESENT SELF-CHECK allows "at most ONE regeneration" if checks fail. The contract catches drift, but not infinitely.
- Honest about its own footguns:
CONFIGURATION.mditself flags that using your main Bluesky password instead of an app password is "bad hygiene" - a rare case of documentation admitting its own flaw instead of hiding it.
On the positive side, the community describes it as "a massive cheat code for research, brainstorming ideas, or tracking meta shifts" (@riabcevv) and "before, researching a topic meant opening 20 tabs. Now it's done in one sentence" (@OddsArch) - the value is real, but it has to be weighed against this friction.
Conclusions and checklist
/last30days works on two levels: it's a useful research tool and a case
study in how to write prompts that stay stable over time, across different
harnesses, even as the underlying model changes. If you take away just one
idea, make it the "rule + dated incident + self-check + examples" template -
it applies to any skill you're writing.
-
Verify you have Python 3.12+ - no other runtime dependency to install.
-
Install the skill via
npx skills addor your harness's plugin mechanism. -
Run
--diagnoseonce to see which sources/providers are actually available on your machine. -
Try
--mock --emit=compactfirst, to see the shape of the contract (badge, evidence clusters, footer) without spending real API calls. -
Configure keys only for the sources you care about, in
.claude/last30days.envfor per-project/per-client setups;chmod 600any file with keys. -
For queries about named entities or comparisons, expect the model to resolve X/GitHub handles via WebSearch first (Step 0.5/0.5b) - that's intentional, not a missing flag.
-
For recurring topics, consider
--store+watchlist.pyinstead of re-running the search by hand each time. -
If you're writing your own skill: copy the LAW pattern (rule + dated incident + self-check + good/bad example) - it's the most portable idea in the whole repo.
