agentic-coding · 2026-05-17

Kaizen in Conductor — 8 Versions, 24 Months, No Big Bang

kaizenconductoragentic-codingiterationjapanlean

TL;DR

Kaizen is the Japanese principle of small, continuous improvements — and the actual operations mode behind Conductor, my agentic coding stack. v1.0 ran May 2024 as a monolithic mega-plugin. v8.1.0 runs May 2026 as a 13-plugin ecosystem. Eight major versions over 24 months = on average one major every 3 months. None of them a big bang. Each fixed exactly one old design flaw. This post shows: what Kaizen concretely means in software, where Toyota’s lean-manufacturing roots are, how it manifests in Conductor, and under which conditions this is transferable to other indie-builder stacks.

Kaizen — small steps, continuous improvement Featured: stylized Kaizen staircase — small steps, same direction.

What Kaizen is (short)
The Toyota root
Conductor as a Kaizen exercise
The 1% logic in daily code work
What is not Kaizen
Three practice heuristics
FAQ

What Kaizen is (short)

Kaizen (改善) is Japanese for “change for the better.” At its core a discipline: what can we do 1% better tomorrow than today? Not radical. Not loud. Not as a big bang.

Three properties distinguish Kaizen from generic “iteration” concepts:

Incremental. Every improvement is small. Not “we refactor the whole system.”
Continuous. Not one-workshop-a-year. Rather: ask a question every day.
Shared. At Toyota, Kaizen was explicitly mandatory for all line workers, not just managers. In indie mode that means: not just the “big plan” gets improved, but every workflow component.

The Toyota root

Kaizen as a term was institutionalized in the early 1950s in the Toyota Production System (TPS). Masaaki Imai’s book “Kaizen: The Key to Japan’s Competitive Success” (1986) popularized the term in the West. Toyota used Kaizen explicitly to continuously improve the manufacturing line — with ~1,000,000+ implemented employee suggestions per year at peak (Toyota Production System Yearly Report 1998).

The decisive insight: small improvements accumulate over time to greater effect than rare large undertakings. 1% per day over 365 days = factor 37.8 ((1.01)^365). 1% per week over 52 weeks = factor 1.68. Frequency beats magnitude.

Toyota actually exploited this math — and is today the consistently most profitable car manufacturer in the world. Kaizen is not Japanese spirituality; it is lean-manufacturing arithmetic.

Conductor as a Kaizen exercise

Conductor is my productive agentic coding stack. Eight major versions over 24 months. Each fixed one design flaw — not all at once.

Version	Date	Kaizen step
v1.0	May 2024	Monolithic mega plugin. Worked for 3 weeks.
v2.0	July 2024	Split into 4 smaller plugins. Token budget saved ~70%.
v3.0	October 2024	Dedicated `conductor` orchestrator plugin. Plugin-calls-plugin for the first time.
v4.0	January 2025	`auto-test` as separate parallel plugin. Largest productivity jump.
v5.0	April 2025	Smart-router attempt, failed. Rollback after 2 weeks.
v6.0	August 2025	MCP server for shared state between plugins.
v7.0	December 2025	Hooks system for quality gates before push.
v8.0	March 2026	Deterministic orchestrator instead of LLM-driven.
v8.1.0	May 2026	Stabilization + `marketing-pilot` from experimental to production.

Observation: none of these versions was an “everything at once” jump. Each addressed exactly one old pain.

Even v5.0 (rollback) is Kaizen-compliant: small step tried, measured, bad, rolled back. That is not failure. That is the discipline of Kaizen. Toyota rolled back countless line experiments — the idea is not “every change must work” but “we try in small steps and keep only what measures better.”

More details on the individual version breaks: what does not work in agentic coding.

The 1% logic in daily code work

Conductor versions are the macro level. But Kaizen also happens daily, in the micro:

Action	Frequency	Kaizen element
`claude --plugin knowhow project-status` in the morning	daily	Identifies “where did I leave off yesterday” — a pragmatic mini-reflection
Pre-push hook enforcing tests/build	every commit	An honest question: is my last step clean?
90-minute slot hard-cut	multiple times daily	Discipline of the mini-break
Weekly repo sweep	weekly	Which repos are current, which are dead? Archive.
Monthly “what did I learn last month”	monthly	Written down, otherwise the lessons evaporate
Annual repo sweep in December	yearly	Major inventory step that recalibrates the whole system

None of these activities is a “big project”. But together they produce a system that looks significantly different after 2 years than at the start.

Concrete example from last month (April 2026): the marketing-pilot plugin had a bug that stopped the hashtag generator too early on multilingual posts. A 12-line fix. Took 35 minutes. Moved marketing-pilot from “functional” to “actually productive.” Three weeks later this improvement was the trigger to move marketing-pilot from experimental to production.

No single 35-minute fix would be release-worthy on its own. But cumulatively, enough such fixes resulted in v8.1.0.

What is not Kaizen

Three anti-patterns I have committed myself, illustrating what Kaizen is NOT:

1. “We rebuild the system”

Classic trap. In the IBM Vista project I saw this multiple times — new architect or new CTO arrives, says “all new.” 80 slides later, 6 months of implementation stagnation. Kaizen says: do not rebuild. Improve the existing step.

My own mistake in 2025: Conductor v5.0 was essentially an “I rebuild the orchestrator” step. Did not work. Rollback. v6.0 was Kaizen-compliant again — improved one specific component (MCP state).

2. “We wait until it is perfect”

The opposite of Wabi-Sabi. Anyone serious about Kaizen accepts: every single improvement may be unfinished as long as it measures better. Conductor v2.0 had its own bugs. Was better than v1.0. Ship.

3. “We hold a strategy retreat about it”

Process overhead. In indie mode a strategy session kills any Kaizen energy. A breakfast-table question (“what annoys me about the auto-test plugin?”) replaces three workshops.

Three practice heuristics

If you want to test Kaizen in your own workflow:

Heuristic 1: One question per day

Every morning ask one question: “What rubbed the most yesterday?” Answer is usually a specific friction point. Fix in 30-60 min. That is one Kaizen iteration.

Heuristic 2: Write it down

Improvements not captured are lost. For me: CHANGELOG.md in every repo, plus a knowhow-plugin entry in Conductor. Toyota calls this the Kaizen suggestion form.

Heuristic 3: Measure before and after

An improvement is only one if it is measurable. With Conductor v5.0 the rollback only worked because I had measured (response latency, output quality) — otherwise I would not have known v5.0 was worse than v4.0.

In indie mode very simple metrics often suffice: “Did this plugin call speed up my workflow last week?” — yes/no. No BI tooling needed.

Where this goes next

Conductor v9 is the next Kaizen iteration. Three specific steps, each small:

Cross-repo convention versioning (~2 weeks)
Subagent state passing as first-class concept (~3 weeks)
Multi-LLM sub-task routing (~4 weeks)

Probably 2-3 months of work, spread across 6-8 months of calendar time. No big bang. A major version, but it is the sum of many small corrections.

If you build an agentic coding stack with Kaizen discipline and want to trade notes: reach me on LinkedIn.

FAQ

How does Kaizen differ from “Agile”?

Agile (Scrum, XP, Kanban) is a concrete methodology with ceremonies (sprint planning, retro, stand-up). Kaizen is an underlying principle — continuous improvement in small steps — that can be practiced within Agile but is older and more general (Toyota applied Kaizen 30 years before the Agile Manifesto in 2001).

Does Kaizen work for solo founders or only in teams?

Both. In solo mode Kaizen is actually easier because no consensus is needed. You decide in the morning what you improve today, and do it. The harder discipline is the continuity — no one asks you after 3 weeks whether you are still at it.

Is Kaizen compatible with big-bang releases?

Not really. Anyone serious about Kaizen avoids big-bang releases as default. But: major versions can be Kaizen releases. Conductor v7.0 is a major version (hooks system) but is the sum of ~12 preparatory small steps. Not one giant idea but consolidation.

Masaaki Imai: “Kaizen: The Key to Japan’s Competitive Success” (1986) for the Toyota root. Dry management reading but definitive. Robert Maurer: “The Spirit of Kaizen” (2012) is more accessible and transfers the principle to individual practice.

How do I measure whether my Kaizen is working?

After 6 months: compare your current workflow to that of 6 months ago. If it is tangibly different — and measures better (output per hour, frustration per day, bug incidence) — Kaizen is working. If it is just “different” without clear improvement, you confused motion with improvement.

Written on May 17, 2026 in Hamburg. This methodology is not my invention — it is 70 years of Toyota plus 24 months of Conductor. If you find this post useful, link to it.

← Back to blog