Writing

Flask Won. Here's Why We Built Genkit the Same Way.

The same thing is happening with AI agent frameworks right now.

When Django existed, everyone thought more features meant a better framework. Django had auth, an ORM, an admin panel, sessions, forms — everything. Flask had almost nothing. Route a URL to a function. Return a response. That’s it.

Flask won.

Not because Django was bad. Django is excellent — for what it’s designed for. Flask won because developers who needed to understand their stack couldn’t understand Django’s magic. They could understand Flask. So they used Flask, added exactly what they needed, and built things they actually owned.

The same thing is happening with AI agent frameworks right now.


The Generate Loop

Here’s an AI agent:

async def generate_loop(prompt, tools, max_depth=10):
    for _ in range(max_depth):
        response = await model.generate(prompt, tools)
        if not response.tool_calls:
            return response.text
        results = await execute_tools(response.tool_calls)
        prompt = build_next_prompt(response, results)

That’s it. Call the model. If it wants to use a tool, call the tool and feed the result back. Repeat until it’s done or you’ve hit your depth limit.

Everything that gets called an “AI agent” is this loop running with some form of state. A multi-step research agent is this loop with a larger tool set and a memory system. A customer service bot is this loop with session history attached. A code review agent is this loop with file access tools and a stopping condition.

Genkit formalizes this loop as generate(). That’s the primitive.

from pydantic import BaseModel
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(plugins=[GoogleAI()], model='googleai/gemini-2.0-flash')

class SearchInput(BaseModel):
    query: str

@ai.tool()
async def web_search(input: SearchInput) -> str:
    """Search the web for a query."""
    # your actual search logic here
    return f"Results for: {input.query}"

# The generate loop, formalized:
response = await ai.generate(
    prompt="What's the latest on Gemini 2.5 Pro?",
    tools=[web_search],
)
print(response.text)

Genkit runs the tool loop internally: calls the model, executes tool requests, feeds results back, keeps going until the model produces a final response. Your web_search function is regular Python. Your result handling is regular Python. The framework handles the loop.


What Genkit Adds — And What It Deliberately Doesn’t

What it adds:

The generate loop, formalized as generate():

response = await ai.generate(prompt=..., tools=[...])

Streaming, because output is time-sensitive:

sr = ai.generate_stream(prompt=...)   # no await — returns immediately
async for chunk in sr.stream:
    if chunk.text:
        print(chunk.text, end='', flush=True)
final = await sr.response

The agent harness — loop plus session state — as define_agent():

from genkit.agent import InMemorySessionStore, AgentInit

agent = ai.define_agent(
    name='myAgent',
    model='googleai/gemini-2.0-flash',
    system='You are a helpful assistant.',
    tools=[web_search],
    store=InMemorySessionStore(),
)
conn = await agent.stream_bidi()
await conn.send_text("What's happening with Gemini 2.5 Pro?")
await conn.close()
async for chunk in conn.receive():
    if chunk.model_chunk:
        for part in chunk.model_chunk.content:
            if hasattr(part.root, 'text'):
                print(part.root.text, end='', flush=True)

Tracing and the Dev UI — the one thing you genuinely cannot get from plain Python. Every generate() call inside a @ai.flow() gets an automatic OpenTelemetry span. Every tool call, every model request, every streaming chunk — visible in the Dev UI at localhost:4000. In production, those traces go to Cloud Trace or any OTel backend. You don’t write a single line of instrumentation code.

@ai.flow()
async def research_flow(input: SearchInput) -> str:
    response = await ai.generate(
        prompt=f"Research: {input.query}",
        tools=[web_search],
    )
    return response.text
# → automatic span: research_flow → generate → web_search → generate

What it deliberately doesn’t add:

Graph abstractions for defining control flow. Your control flow is if/else and for loops — the Python you already know.

Prompt template systems. Your prompts are f-strings or Pydantic models — formats you already use.

Custom execution engines. generate() is async Python. You can await it, asyncio.gather() it, call it from a FastAPI handler — it’s just a coroutine.

Framework-specific memory classes. Your state is a Python dict, a Pydantic model, or a database — whatever makes sense for your application.

Cross-cutting concerns like retry? One middleware line:

from genkit.plugins.middleware import Retry

response = await ai.generate(
    prompt=user_input,
    tools=[web_search],
    use=[Retry(max_retries=3)],   # handles RESOURCE_EXHAUSTED, UNAVAILABLE, etc.
)

Your auth logic: normal Python. Your routing: FastAPI routes. Your data models: Pydantic. The AI layer is thin — thin enough that you can read it, understand it, and own it.


Why This Matters in 2026

AI agent code is changing fast. Not slowly fast — month-over-month fast.

In the last twelve months: Gemini 2.5 Pro shipped with substantially better tool-calling. OpenAI released o3 with chain-of-thought tooling. Anthropic dropped extended thinking. Multi-agent patterns that looked experimental six months ago are now standard. Context windows went from 128k to 1M+ tokens.

Every one of those shifts changes what an optimal agent looks like. Better tool-calling changes how you structure your tool set. Longer context changes what you need to store in memory vs. pass in the prompt. Better reasoning changes when you use a single powerful model vs. a chain of smaller ones.

If you’ve built on heavy abstractions, each of these shifts is a framework problem first, then your problem. You wait for the framework to catch up, file GitHub issues, and deal with migration guides that break your working code. Or you start over.

The developers who moved fastest through 2025 had thin AI layers. When Gemini 2.5 Pro dropped, they changed a model string. When parallel tool calls became standard, they updated a generate() call. Their business logic didn’t change because it was never tangled with the AI framework in the first place.

That’s not an accident. That’s what minimum abstraction gets you.


The Coding Agent Multiplier

In 2026, coding agents — Claude Code, Copilot, Codex — write a significant portion of production code. This changes which frameworks win.

Consider this prompt to a coding agent:

“Add retry logic with exponential backoff to the research flow.”

In a Genkit codebase:

# Before
response = await ai.generate(prompt=user_input, tools=[web_search])

# After (3 lines: import + add use=[...])
from genkit.plugins.middleware import Retry
response = await ai.generate(
    prompt=user_input, tools=[web_search], use=[Retry(max_retries=3)]
)

The coding agent adds one import and one parameter. It can read the full Genkit API surface in a single context window. There’s no ambiguity about where retry belongs — it’s use=[...] on generate(). Done.

Now the same prompt against a LangGraph codebase:

The coding agent has to understand the graph topology. Does retry belong in a node? On the edge condition? In the call_model function? Does it interact with checkpoint behavior if the retry happens after a tool call was logged? Should it be a custom node or a wrapper? Different LangGraph apps answer this differently because the framework doesn’t have a standard answer.

Fewer concepts means less surface area to hallucinate. A coding agent working with Genkit makes fewer mistakes than one working with a framework that has a dozen ways to solve every problem. In a world where AI writes significant amounts of production code, this compounds: simpler frameworks produce more reliable AI-assisted development.

Flask was simpler than Django and easier for humans to reason about. Genkit is simpler than LangChain and easier for coding agents to reason about. The principle is the same. The leverage is bigger.


The Bet

We’re making the same bet Flask made: developers who understand their stack build better software. Frameworks that optimize for “magic” optimize against that.

The generate loop is the only primitive an AI agent needs. Call the model. Use tools. Repeat. Everything else should be your code.

That’s not an argument against abstraction. It’s an argument against unnecessary abstraction. Tracing is necessary — code can’t instrument itself. The agent session harness is necessary — the bidi-streaming protocol is not trivial to implement correctly. Everything else? If it’s auth, write auth. If it’s routing, write routing. If it’s data transformation, write a function.

When AI agent patterns shift next month — and they will — thin-layer code adapts. Heavy-abstraction code waits for a migration guide.


The Practical Part

FastAPI + Genkit is the deploy pattern:

from fastapi import FastAPI
from genkit.plugins.fastapi import genkit_fastapi_handler

app = FastAPI()

@app.post('/research', response_model=None)
@genkit_fastapi_handler(ai)
@ai.flow()
async def research_flow(input: SearchInput) -> str:
    response = await ai.generate(
        prompt=f"Research: {input.query}",
        tools=[web_search],
        use=[Retry(max_retries=3)],
    )
    return response.text

Small. Deployable to Cloud Run with uvicorn main:app. Ergonomic for fullstack — the JS team uses @genkit-ai/core with the same flow model, the same DevUI, the same trace format. The AI layer is thin in both directions.

If any of that resonates: start with Getting Started with Genkit Python.


Genkit Python install:

uv pip install "genkit[google-genai] @ git+https://github.com/firebase/genkit.git#subdirectory=py/packages/genkit"

Dev UI: genkit start -- uvicorn main:app --reload. Traces at localhost:4000.