My Architect, part 4: the agent's working loop

This is the fourth post in the series about My Architect. In part 1 I showed the agent's loop in a single code block and promised to take it apart in detail. Parts 2 and 3 covered the YAML file storage and the planning model. Today, the working loop itself: how the agent picks up a task, works it and closes it so that the project remains a living source of truth rather than an after-the-fact journal.

Session start: one call instead of a recap

I used to begin every new session with a recap: what's done, what we decided, where we left off. Now it's one call:

get_project_context({ pid })
→ meta + hierarchy + backlog + diagrams + stats

The agent gets the entire hierarchy with statuses, releases and statistics in one go instead of five separate requests.

That leaves the question of how the agent knows the pid. Hardcoding it into the skill is out — the skill is universal. So the skill encodes a ladder of attempts. First the agent searches the project's local CLAUDE.md for the pattern pid: "..." near a mention of my_architect. Not found — it calls list_projects: if there is exactly one project, that's the one; if there are several, the agent asks the human instead of guessing. If there are zero projects, the agent offers scaffold_project but doesn't run it without confirmation. Creating a project is a scoping decision, not routine.

Which task to pick

get_next_task picks a task by three rules, in priority order: the earliest release first, then the deepest tree level within the release, and ties broken alphabetically by title. Nodes with status done and cancelled are excluded outright; nodes without a release go to the back of the queue.

Before the sort there's one more filter: if there are leaves among the candidates — nodes with no children — the choice is made only among them. This is leaf-first, and the reason is practical. A parent node like "Auth flow" can't be done by itself; it closes when all of its children are closed. If the agent takes the parent directly, it will try to do several things in one pass and smear the work. A leaf is atomic: one pass, one commit, clear acceptance. And the parents will close on their own, via the cascade described below.

Alphabetical order in third place gives determinism. The same backlog always returns the same task, so the agent's behavior can be predicted and verified.

The task loop

The full sequence looks like this:

get_next_task({ pid })            → a task with its path and parent context
start_task({ pid, nodeId })       → in-progress, assignee: agent
get_node → get_doc                → read the node's docs BEFORE the code
  ... work; update_doc along the way, build_hierarchy
      if a sub-breakdown surfaced ...
validate_project({ pid })         → fix dangling references
complete_task({ pid, nodeId, summary })
                                  → done, cascade up the ancestors,
                                    the next task already in the response

The most interesting part here is complete_task. Before updating, it takes a snapshot of the entire ancestor chain, sets the node to done, records the summary and compares ancestor statuses before and after. If the last task of a feature just closed, the feature went done automatically, and the agent sees it in the response as an explicit list: which node, what the status was, what it became. The same response carries next_task — no separate call for the next task is needed, the loop closes itself.

If a task hits an external obstacle, there's block_task with a mandatory reason. The reason is written into the node, and on the canvas the human sees not a faceless red rectangle but a concrete "waiting for keys from the payment provider".

Docs are updated along the way, not afterwards

Markdown documents hang on nodes, and they have a single purpose: to describe how the feature works now. Not how it was planned and not how it ought to have been. So the skill carries a hard rule: picked up a task — read the node's docs first, the truth about the feature is there, not in the title. Understanding changed mid-work — update_doc immediately, in the same turn.

A node or a doc that lies is worse than not having one.

Before closing a task the agent must run validate_project. The validator returns a list of problems with type and severity: dangling references to docs and diagrams, cycles in the hierarchy, lost parents. Broken references in the source of truth are not cosmetics, so complete_task is not called without a clean run.

"We'll finish it later" no longer gets lost

Any agent generates deferred items as it works: "we'll cover this with tests later", "retries aren't wired up yet", "we'll need caching when load grows". In a chat these phrases die with the session. In My Architect the rule is different: every such deferred item becomes a node before the end of the current turn.

Where exactly it lands is decided by a rubric from the skill. Technical debt of the feature just closed the agent records silently: same epic, same release, a prefix like tech-debt:. An improvement with an explicit trigger ("when paying users appear", "if the false-positive rate exceeds five percent") is also recorded without questions, but into a future release and with the trigger right in the description, so it's clear later when the node "fires". But strategic questions — whether the feature is needed at all, whether to change the defaults, whether to pull in a new dependency — the agent must stop and hand to the human, presenting placement options with trade-offs.

When in doubt, the skill says to "err on the side of asking": thirty seconds of confirmation is cheaper than a node surfacing in the wrong release a month later. And before creating anything there's always a duplicate check against the already loaded context, because a duplicate in the backlog breaks get_next_task prioritization: with leaf-first and alphabetical order, the copy can outrun the real work.

Reconciling the plan with the code

A plan tends to drift away from the codebase: something got done outside the tracker, something got finished and nobody marked it. For that there's the slash command /my-architect:reconcile. The agent walks all draft nodes, starting from the nearest release, and for each checks against the code whether it has already shipped: it looks for routes, components, tests, and reads them. What's confirmed by code gets closed via bulk_update_nodes with the response checked. What's partially done stays draft with a note of what exactly is missing. The command's main rule: never mark a node done without evidence in the code. Accuracy beats the amount closed.

The second command, /my-architect:progress, answers "where are we": done percentages by releases and epics, the current in-progress task and what's left on it, the next task in the queue. If the command sees draft nodes that by all appearances have already shipped, it will itself suggest running reconcile.

What this buys you

The agent can run a project for weeks. Sessions end, context windows get compressed, but the structure remains: every closed task left a summary, every deferred item became a node, every doc describes the current state of the code. The human can open the canvas at any moment and see an honest picture, not the one that was current as of the last call.

In part five I'll cover distribution through the Claude Code marketplace: how the plugin that installs the skill and the MCP server with one command is built, and why the skill lives in a public repository. You can try My Architect at my-architect.app.