USESix factors to design for. When the output isn't good, think RIGHTS.
Use it as a checklist when you are designing an AI system, agent, or product — make sure every one of the six factors is properly in place before you ship. Use it as a diagnostic when an existing system underperforms — walk the letters and find the one that's missing or weak.
There is no rescuing a system whose grounding is wrong by adding more tokens. There is no rescuing a system without a feedback loop by buying a smarter model. Most "AI doesn't work" stories are not stories of bad models — they are stories of one of these factors being absent. Get all six in place from the start, and the same model that produced thin work for the team next door will produce work indistinguishable from a senior professional's.
The two are easy to conflate, and conflating them is why most teams build neither well. Read this before you read R and I.
R runs after a job. Look at where the agent failed last week, find the pattern, edit the agent files, prompts, skills, and tools so next week's batch goes better. The artefact you shipped is already gone — what you're improving is the machine that produced it.
Without R, every error is one you'll see again next week. The system plateaus at the level of the prompts you wrote on day one.
I runs during a job. The model produces a draft, a critic compares it to G and reports the gap, the model fixes the gap and re-submits. Repeat until the artefact converges on the spec. The system doesn't change — only the artefact does.
Without I, you ship first drafts. First drafts are rarely the answer; the answer is what comes back after the model has been told why the first draft fell short.
R is the loop that runs between jobs. The job already shipped — what you're improving is the system that will produce next week's job, and the week after that.
Most teams skip R entirely. They fix bad output by hand, ship it, and move on. The same class of error reappears two days later in another job, and gets hand-fixed again. The system never compounds. The first job and the thousandth job are produced by the same prompt, and each takes the same amount of human babysitting. R is what makes the thousandth job ten times cheaper to ship than the first.
I is the loop that runs inside a single job. The model produces a draft, a critic compares it to G and reports the gap, the model fixes the gap and re-submits. The artefact converges on the spec.
Without I, every output that ships is a first attempt. First attempts from a model are like first attempts from a person — recognisable, sometimes brilliant, often subtly wrong. The team that builds I is the team whose deliverables look the same on the third draft as a senior professional's first.
G is the spec — the explicit picture of what this deliverable looks like when it is right. Most projects skip this step. They start with a brief and hope the model figures the rest out from training data.
It usually does — fluently, confidently, and subtly wrong. And without G you will never know which part to fix, because you never wrote down what right looks like in the first place. Every other factor is shooting at a target nobody drew.
H is the principle that the deliverable has to land — it has to arrive in a form the recipient can use today, in their existing tools, with no translation step between AI output and human action.
An accurate cost plan returned as JSON is a correct cost plan that nobody opens. A perfect risk register returned as a markdown table is a correct register that the PM has to manually retype into Excel. Output that fails H technically works and is functionally useless — and the failure is doubly invisible, because the team that built it sees the AI succeed at the task while the team that's meant to use it sees nothing they can act on.
T is the AI-native tool surface — the set of capabilities and primitives the model can actually call. A model with a great toolbelt operates like a senior professional. A model without one operates like a smart graduate with a phone and no email account.
This is the factor most teams under-invest in, and it is usually the cheapest to lift with the largest effect. A weaker model with a great toolbelt will out-perform a stronger model with a generic one — every single time.
S is the substrate — the operating conditions the model is given. Tokens, context window, time to converge, no mid-task caps. Substrate is the easiest factor to ignore and the most expensive to underprovision: the same agent on a constrained substrate produces visibly worse work than on an open one, and the team that imposed the constraint usually can't see the connection.
The build reading is in the six sections above — design every factor in from the start. This section is the diagnostic reading, for when you have inherited a system or shipped one that underperforms. The instinct is to add more prompt text or upgrade the model; both are usually wrong. Almost always one of the six factors is missing or weak, and fixing that one factor will unlock more quality than anything you can do to the other five combined. Match the symptom you are seeing to the factor that is failing.
| Factor missing | What you'll see | What it means · what to do |
|---|---|---|
| R | The same class of error keeps showing up, job after job. The system never compounds. | No loop is closing on failures at the system level. Build a grader and run atomic trials against the named errors. |
| I | First drafts ship. Quality is whatever the model produced on the first attempt. | The artefact never gets a second look. Add a critic that reports the differential and a loop that fixes it. |
| G | Output is fluent, confident, and subtly wrong. Nobody can articulate exactly what's off. | The model has nothing real to anchor on. Write the spec — the schema, the exemplar, the gold standard — before the next iteration. |
| H | Output is correct but adoption is zero. The team keeps doing the work themselves. | Wrong shape, wrong format, wrong tool. Re-shape the deliverable into the format the user already opens daily. |
| T | Model burns tokens on basic operations and still gets them wrong. Generic tasks feel hard. | Deriving what it should be composing. Build the primitive library and let the model pick parts instead of inventing them. |
| S | Quality collapses on long jobs. Truncation, rate-limit hits, lost context, half-finished deliverables. | Wrong substrate. Move to a harness with token headroom, no mid-task caps, and the time to converge. |
It is the design checklist I open every time I scope a new agent or tool surface, the diagnostic I walk every time an existing system underperforms, and the conversation I lead with every team I advise. If you want to walk it through your stack with me, write me an email.
Talk about your stack →