Writing
Loop design in Genkit Python
Lance Martin's self-correction pattern from his Fable 5 guide maps onto the primitives Genkit Python already has.
Lance Martin published a guide this week on designing loops with Fable 5. Two techniques: self-correction loops, and memory as an outer loop wrapping them. Neither requires a new framework primitive. Both map directly onto what Genkit Python already gives you.
The mechanics of a self-correction loop are worth stating concretely. You call a model, get output, pass that output to a separate verifier, get a verdict with feedback, and if it fails, retry with the feedback appended to the original prompt. You cap the retries at some small number and return the best attempt. That is the whole thing.
The part that matters is the verifier. The natural instinct is to ask the same model to review its own output. This usually fails. The model uses the same reasoning to evaluate as it did to generate, which means it tends to reproduce the same blind spots. A verifier works when it has an external standard to check against, one that is defined before the loop runs, not inferred from the task description. Martin calls this the goal. In Claude Code it is a /goal declaration. In a Genkit flow, it is a Pydantic model you define up front and pass in as input.
Here is the implementation.
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
from pydantic import BaseModel
ai = Genkit(plugins=[GoogleAI()], model='googleai/gemini-2.0-flash')
class Goal(BaseModel):
criteria: list[str]
class TaskInput(BaseModel):
task: str
goal: Goal
class Verdict(BaseModel):
passed: bool
feedback: str
@ai.flow()
async def correction_loop(input: TaskInput) -> str:
feedback = ""
for attempt in range(3):
prompt = input.task
if feedback:
prompt += f"\n\nPrior attempt failed: {feedback}. Correct your approach."
output = (await ai.generate(prompt=prompt)).text
verdict = (await ai.generate(
prompt=(
f"Grade this output against each criterion.\n"
f"Criteria: {input.goal.criteria}\n\n"
f"Output:\n{output}"
),
output_format='json',
output_schema=Verdict,
)).output
if verdict.passed:
return output
feedback = verdict.feedback
return output
Three things to notice. The goal is passed in by the caller, not inferred by the verifier from the task text. The verifier grades against an explicit list of criteria, which is what separates this from asking the model to “check your work.” And the feedback from a failed attempt travels into the next attempt’s prompt directly, so each retry has the context of what went wrong before.
If you are deciding what to put in criteria, the useful test is: would a human reviewer flag a violation of this rule without needing to interpret it? Concrete conditions work (“the function handles an empty list”, “the response is under 150 words”). Vague ones do not (“the output is high quality”, “the code is clean”). Vague criteria produce lenient verdicts, which defeats the loop.
The second loop in Martin’s guide operates at a different timescale. The self-correction loop runs within a single task. The memory loop runs across tasks, accumulating facts from one session that are available in the next. It is an outer loop: each run of the inner loop can generate learnings, those learnings get written to persistent storage, and future runs read them before starting.
Genkit Python has no built-in memory primitive. You can build one with structured output and a JSON file.
import json
from pathlib import Path
MEMORY_FILE = Path("agent_memory.json")
def load_memory() -> list[str]:
if MEMORY_FILE.exists():
return json.loads(MEMORY_FILE.read_text())
return []
def save_memory(facts: list[str]):
MEMORY_FILE.write_text(json.dumps(facts, indent=2))
class MemoryUpdate(BaseModel):
learned_facts: list[str]
@ai.flow()
async def memory_loop(input: TaskInput) -> str:
memory = load_memory()
ctx = "\n".join(f"- {f}" for f in memory) if memory else "No prior knowledge."
output = (await ai.generate(
prompt=f"Prior learnings:\n{ctx}\n\nTask: {input.task}"
)).text
new_facts = (await ai.generate(
prompt=(
f"Task: {input.task}\n"
f"Output: {output}\n\n"
f"What rules should be remembered for future similar tasks?"
),
output_format='json',
output_schema=MemoryUpdate,
)).output.learned_facts
save_memory(memory + new_facts)
return output
The distillation call at the end matters. You are not appending the full task and output to memory. You are asking the model to extract transferable rules from the experience. Martin describes this as a five-rung ladder: fail, investigate, verify, distill, consult. The distillation call here handles verify and distill together. What you get back is rules, not logs. Future runs consult the rules rather than rederiving them from scratch.
If you are deciding whether to combine these two flows, the answer depends on whether your tasks are independent or sequential. If each run is a fresh task with no relationship to prior runs, memory adds noise. If your agent is working through a domain repeatedly (debugging the same codebase, drafting in the same domain, operating the same pipeline), memory compounds. The correction loop is always worth having. The memory loop earns its cost only when the domain recurs.
Both of these are just flows. A flow is an async function with @ai.flow(). A loop is a for loop inside it. The framework gives you generate() and structured output via Pydantic. The rest is Python.
What Genkit Python does not give you is a first-class Goal type or an evaluator primitive. You define both yourself. That is probably the right call for now: the pattern is clear enough that a shared primitive would be more opinionated than useful. If you find yourself writing the same verifier scaffolding across many flows, that is the signal that an abstraction is worth building. One flow is not that signal.
A note on model selection: both flows above use googleai/gemini-2.0-flash for both the worker and verifier calls. You may want a stronger model for the verifier, since it needs to reason carefully against specific criteria. You may want a faster model for the worker, since failed attempts that get retried multiply cost. Genkit lets you pass model= to any generate() call, so this is a one-line change per call site.
Martin’s full guide is here.