I run growth for a SaaS that's done $2 million CAD+ in lifetime revenue, and most of the day-to-day work is now shared between me and 12 named AI agents running on a single Anthropic subscription. This page is the whole system, laid out honestly: the org chart, the code, the numbers, the wins, the misses, and the parts I'd build differently if I started today.
I'm publishing it because I think more founders and operators should be building this way and I want to help. A year ago I wanted to read exactly this page and couldn't find it. So I'm writing mine, with the receipts and the rough edges still in. The public repo is on the way too, so you'll be able to read the actual code.
The Growth Engine is a 12-agent system, orchestrated by a single COMMAND agent, that runs lifecycle email, in-app messaging, SEO analysis, funnel diagnostics, revenue reporting, support analytics, and adversarial plan review for Speak AI. It's a Python repo, not a SaaS tool, and every agent has a named role, a defined scope, and a profile file the orchestrator loads on demand.
That's the short version for the snippet. Here's what it actually looks like.
It's a Python repo with 13 modules: CRM, funnel, support, ads, revenue, Sentry, social, website experiments, WordPress, link building, and billing. Each module has a handful of scripts that pull data from the tools we already pay for (SendGrid, Intercom, GSC, GA4, Stripe, Paddle, RevenueCat) and write the outputs back into plans, reports, and draft sends. The agents read those scripts, draft copy, audit sequences, score experiments, and flag risks. Nothing sends without a plan, and no plan ships without me approving it. That part matters a lot, and I'll come back to it in the methodology section.
The reason I keep calling it an “engine” instead of an “agent” or an “assistant” is that it's not one model doing everything. It's 12 specialists with narrow lanes, sitting under one orchestrator (COMMAND) that I talk to. That structure is doing most of the heavy lifting. A single generalist agent loses the plot once the surface area gets wide enough; a team of specialists with clear scope doesn't. I'll show you the org chart and the file layout below so it's concrete.
What I'm actually running this on, right now: Cursor as the editor, Claude (via Claude Code in the terminal) as the model, GitHub for the repo. Our team also runs a couple of Claude Max plans and rotates between them when one hits its weekly cap on heavy work, which is part of why token efficiency (SCROOGE's whole job) matters in the first place. I'm exploring Codex inside this same flow, and watching local models. The longer-term goal is for the agent layer to be model-agnostic, even though today it's Claude-heavy.
The piece I tell anyone getting started: per project, per website, per repo, you need three things in place before this works for you. The context (a CLAUDE.md with the rules of how you work), the instructions (the agents and the skills), and the credentials (so the agents can actually touch your tools). Once those three are sitting in a repo, you have a lot of power. That's the whole setup, and it's the same setup I'd recommend if you were running this across multiple sites or products.
Every agent has a codename, a role, a domain, and a profile at .claude/agents/{codename}.md that loads only when that agent is spawned. COMMAND is the orchestrator and the only agent that talks to me. The other 12 are specialists: they receive a delegated task, do the work in their lane, and report back. They don't talk to each other. They don't drift out of scope. They push back when a plan is weak.
The Growth Engine team. COMMAND is the only agent I speak to; the 12 specialists are spawned on demand and report back inside a single Claude Code session.
Owns SendGrid sequences, Intercom campaigns, deliverability, and suppression. Verifies unsubscribe headers and compliance footers before any marketing send.
Writes and audits email copy, in-app messages, and landing page copy. Edits to my voice from a sample set, not from a template library.
Default-in on any external-facing change. Asks the only question that matters: does this move activation, conversion, or retention.
Reads the ICP profile docs and checks every cohort-targeted send against power-user fit. Section 7 below is a full deep-dive on this one.
On-demand ICE ranking across growth experiments. Runs when I have more proposed experiments than the week can hold, which is most weeks.
Pulls GSC and GA4, correlates page changes to query movement, and flags decay before it becomes a quarterly problem.
Reads support conversations in bulk, surfaces topic clusters, resolution quality, and the documentation gaps that keep generating the same ticket.
Weekly view across Stripe, Paddle, and RevenueCat. New-customer cohort, plan distribution, and the trend that's getting better or worse this month.
Diagnoses where users drop off and whether the existing lifecycle coverage actually catches them before they bounce.
Stops scope creep at the planning stage. Flags any work item that's drifted into “while we're in here” territory.
Adversarial review of every plan. Pushes back on bloat, overengineering, and any abstraction that doesn't earn its weight.
Audits agent preambles and cold-start cost. Keeps the whole substrate fitting inside one Claude Max plan instead of three.
The org chart is a real artifact, not a marketing diagram. Every node has a profile file on disk. Every relationship is encoded in the operating rules: agents report through COMMAND, and any agent that finds work outside their lane stops and escalates. That discipline is what makes the team feel like a team instead of one model wearing 12 hats.
What's been most useful for me is how much the work has compressed. A full lifecycle email audit, a complete SEO query analysis across the site, a from-scratch onboarding sequence draft: each of those used to be a quarter-long project. Now they're a few hours of focused work, often while I'm on the walking pad dictating into the laptop. That's the actual delta, and it's why I'm spending the time to write this page in the first place.
I get asked this a lot, mostly by people who already use n8n, Zapier, or Make and want to know what's different. Honest answer: workflows are great, and a bunch of this stack still uses them for the wiring. The difference is in what's doing the thinking.
A workflow executes the rules I wrote. If I forgot a case, it forgets too. If the situation changes, the workflow keeps doing what it did yesterday until I rewrite it. Agents are the other shape: they read the context fresh every time, draft a proposal, and tell me when something looks off. Last week BEACON flagged that a sequence I was about to ship had a step that would land on a Sunday morning, which is one of our worst send windows historically. A workflow would have sent it. An agent saw it and stopped.
The other piece is institutional knowledge. The agents share the same docs I do: ICP profiles, lifecycle journey notes, the running tally of what's worked and what hasn't. That context lives in markdown files in the repo, and every agent reads its slice before doing the work. So when I spawn QUILL to write a re-engagement email, QUILL isn't writing from scratch. It's writing inside the same set of opinions I'd bring myself, just faster and without the part where I get distracted by Slack.
If you already love workflows, keep them. The agents sit on top, not underneath. Most weeks the agents propose the change, I approve it, and then a regular workflow does the actual sending. That layering is the part that finally clicked for me.
The single rule that makes any of this safe to run on a real customer base: nothing ships without a plan, and no plan ships without me approving it. Every agent is wired around that rule. It's the first sacred rule in AGENTS.md and it's the one I'm most protective of.
COMMAND delegates to the right specialist. The specialist drafts a plan with concrete diffs, sample sends, or report output.
SPARTAN, SCOPE, and CONVERT review for bloat, scope creep, and funnel impact. ICP signs off on cohort fit.
I read every plan before it executes. Approve, reject, or send back with notes. No exceptions for marketing sends.
BEACON sends the email. SEO publishes the page. The plan moves to the archive with the actual outcome attached.
Two of those steps are non-negotiable. The first is the review pass by the guard agents. SPARTAN is the one I'd recommend any founder copy first: it reviews every plan adversarially, calls out anything bloated or premature, and refuses to agree with COMMAND just to be agreeable. It's saved me from at least a dozen “we should also” moments that would have turned a clean change into a sprawling one.
The second is my approval. For lifecycle sends and prospect outreach the rule is even tighter: every prospect email is a human-reviewed draft follow-up, not a sequence that fires on its own. The agents write the draft, surface the context, and queue it for me. I open it, edit it, and decide whether to send. That's the only mode I'm comfortable running customer-facing messaging in, and it's the only mode I'd recommend to anyone else running this on real people.
The plan-first discipline is also where the speed actually comes from, which sounds paradoxical until you've felt it. Without plans, agents drift, redo each other's work, and produce three half-finished things instead of one shipped one. With a plan, the work compresses. Six months to four hours isn't because the agents are faster than humans at any one step. It's because the planning stage stops being a meeting and starts being a 60-second exchange.
Out of the 12, ICP is the one I want to show you end to end. It's the agent that decides whether a proposed send, sequence, segment, or experiment actually fits the customers Speak AI is built for. It's also the agent that's saved me from the most expensive mistakes, which is the kind of receipt I trust more than a metric.
ICP's profile file points it at a docs folder of power-user ICP profiles. Those profiles describe who we serve well, what signals predict long retention, and what kind of users we historically misfire on. The profiles are written in plain language, not in SQL. They're opinionated. They get updated when I learn something new from talking to customers, not when a dashboard moves.
Say BEACON has drafted a re-engagement sequence aimed at trials that didn't convert in the first 14 days. COMMAND spawns ICP to validate the cohort. ICP reads the segment definition, compares it against the power-user profile, and asks the questions a careful colleague would ask: are we including any cohort that historically over-indexed on free-trial behaviour but never showed signs of fit? Are we excluding the long-tail trials that took 30 days to activate because of a feature they discovered late? Is there a use case in this cohort we don't actually support well, where re-engaging is going to lead to a worse experience, not a better one?
The output is a short markdown response with three things: a verdict (proceed, modify, or reject), the cohort logic ICP would change if it were running this, and the trade-off in plain language. No dashboards, no screenshots, no metrics theatre. Just the read.
On the access side: ICP reads cohort data from our internal systems within the same rules I'd follow as a founder myself. The agent operates with the same access I have and surfaces the same kinds of reads I'd do manually, just faster and more consistently. Nothing about that boundary is hand-wavy magic. It's the same boundary I'd recommend any operator set when wiring agents up to their own data.
The ICP read is the difference between “we sent to a clean segment” and “we sent to the right segment.” Most growth tooling can do the first. The second is where founder-level judgment used to be the only available option, and it's the part I was most skeptical an agent could handle. A year in: ICP catches things I would have missed, and pushes back on plans I would have approved. That's the bar I needed to see, and it's the one ICP cleared.
If the 12 agents are the team, squad-skills are how I actually call the team into a room. They're a set of cross-repo command surfaces, shared between my growth repo and my product repo, that handle agent selection, planning broadcasts, parallel spawning, and the audit-to-publish loop. They live in the Speak AI client repo because they started there, but they apply just as much to the Growth Engine.
The way to think about squad-skills is as a meta-layer. Without them, I'd have to remember which agent owns which kind of work, manually load the right profile, and run each step myself. With them, I type one command, the right agents get selected based on the domain of the change, the right context files get loaded, and the squad runs the full process from initial audit through to a publishable plan.
This is also where the Growth Engine plugs into the rest of how I work. The same squad-skills that orchestrate growth agents orchestrate engineering agents over in the product codebase. The Pillar 1 page on agentic engineering will go deep on that side. The point for this page is: the same operating model spans growth and code, and the squad-skills layer is the bridge.
The short version of what's in the squad-skills layer: a planner that runs the broadcast, a review skill that runs the adversarial pass, and a resume skill that lets me pick up an in-flight plan in a new session without losing state. That resume flow is one of the pieces I'm most proud of. It lets me /clear a long session, keep the context that matters, and roll work forward through master plans, child plans, and phases without losing where I was. That's what makes day-after-day continuity possible inside a token-bounded plan.
Two things sit underneath squad-skills and deserve their own callout because they're easy to miss: the CLAUDE.md context layer and the auto-memory system. CLAUDE.md is where each repo's rules live, per-project, so every agent that spawns reads the same operating rules I do. Memory is the layer that persists what I've learned (preferences, decisions, feedback) across sessions, so I don't have to teach the team the same lesson twice. Together they're the difference between “AI that helps for an hour” and “AI that's actually part of the team.”
Quick context before the numbers. Speak AI has done $2 million CAD+ in lifetime revenue, but a lot of that was built well before any of this existed. The Growth Engine is what's running the day-to-day growth motion now, and the cleanest receipts are the ones from the months I've been actively ramping it up. So this section is more about “what's the engine moving this year” than “what's the company's lifetime total.”
April 2026 vs March 2026 (Stripe)I'm being deliberate about saying “what we're seeing right now.” Things are up across the board this month. That might not hold next month. The point of publishing it is to be transparent about the direction, not to claim a permanent state. (I'm working on a way to put a live snapshot of these numbers directly on this page, so what you see is always current. For now, it's manually updated. Live feed: coming soon.)
What the engine has shippedgit log, not a vanity metric.A lot of this work happens while I'm on the walking pad dictating into the laptop. That's the operating mode that's emerged: voice in, agents propose, I review on a real screen later, ship. The leverage isn't that the agents are faster than I am at any single step. It's that the cost of starting any specific piece of work has dropped to near zero. I describe the change in a sentence, and a plan exists 90 seconds later. That changes what kinds of experiments are worth running, which changes what kinds of bets I take.
This page is the pillar. Each of the systems below is getting its own deep-dive page over the next few weeks, with the same level of detail you've seen so far. I'd rather ship them when they're real than link to placeholder pages today.
A page per agent. Profile, scope, decision protocol, sample plans, where they push back, where they get it wrong. The deepest reference for anyone building their own roster.
How BEACON and QUILL run 25 templates and 39 sequences across SendGrid and Intercom. Deliverability, suppression, the audit loop, and the actual copy patterns we've found that work.
SEO agent walkthrough. GSC + GA4 ingestion, query-to-page correlation, programmatic page generation, and the editorial bar that keeps it from turning into AI slop.
SUPPORT reading Intercom conversations in bulk. Topic clusters, resolution quality, and the documentation gaps that keep generating the same five tickets.
Affiliate program structure, partner outreach, and the lifecycle work specific to a partner audience. Sits closer to lifecycle than to ads in our setup.
Social ops: the small surface, the honest cadence, and how QUILL drafts without the page sounding like every other founder-on-LinkedIn account.
Paid acquisition: where we spend, where we don't, and what the agents are doing with the ad-side data versus the organic side.
The website-optimizer module: experiment design, the CRO discipline, and how FUNNEL ties test results back to lifecycle stage movement.
The newest surface and the one I'm most curious about. Script pipeline, agent-assisted editing notes, and where the line between “agent draft” and “founder voice” needs to stay sharp.
If there's a sub-system you want to read first, email me (link in the CTAs below) and I'll move it up the queue. I genuinely want to know what's most useful, and I'd rather ship in the order people actually want than the order I assumed.
I'm publishing the growth repo. Not a sanitized template version, the actual repo: AGENTS.md, the agent profile files, the modules, the scripts. Credentials and customer data are gitignored (and gitleaks-scanned) the same way they are in private. Everything else, you'll see.
The public link will land here. Drop me an email below if you want the heads-up the day it goes live.
I'm publishing it because the thing I most wanted a year ago was someone else's real repo to read. Frameworks didn't help me. Template packs didn't help me. The repo of a founder running this on a real product would have saved me months. So that's what I'm putting out, and I'm putting it out with the parts I'd do differently still in it, because that's where the actual learning is.
In practice our team runs a couple of Claude Max 20x plans and rotates between them when one hits its weekly cap on heavy work. Even doubled up, the total Anthropic spend stays well under what a single junior marketing hire would cost. Token efficiency (SCROOGE's whole job) is what keeps this affordable.
Two honest caveats. First, the Anthropic spend is the only number I'm counting here. The rest of the stack (SendGrid, Intercom, GSC, GA4, Stripe, Paddle, RevenueCat, the WordPress site, the Cloudways VPS) is a separate set of costs that any growth team would have anyway. The point of this number is the leverage from the agent layer specifically.
Second, this does not replace a marketing hire emotionally. It does not bring opinions to a meeting, push back at the right moment in a strategy debate, or build relationships with partners. What it does is take the mechanical work off the founder's plate so the founder can do the parts only the founder can do. I'd still hire a great marketer the moment I could justify it. I just wouldn't hire one to do the things the agents already do well.
A few things I got wrong, and one I'd keep exactly the same.
I'd start with the guards before the producers. The first month I built BEACON and QUILL and started shipping copy and sequences. It worked, but I was the only safety net. The week I added SPARTAN and SCOPE was the week the whole system actually got faster, because I stopped second-guessing every plan myself. SCOPE in particular was the agent that turned my “I want to build this thing” energy into refined plans split into phases that would actually get shipped. If you're building your own version, write the adversarial and scope agents first. The producing agents are the easy part.
I'd name agents earlier. The first version of this had three agents called “marketer,” “analyst,” and “writer.” Naming them BEACON, QUILL, and SEO did something I didn't expect: it gave them identity, made it easier to delegate to them in plain language, and made me think about scope more carefully. The codenames feel silly until you've used them for a week.
I'd put the docs in the repo from day one. The ICP profiles, the lifecycle journey notes, the audit history. I had a version where the agents pulled context from Notion and the latency alone made me hate it. Markdown in the repo, loaded on demand, is the only setup that's felt right.
I'd resist the urge to add more agents. 12 is already a lot. Each additional agent is overhead: another scope to defend, another file to maintain, another opinion to coordinate. The marginal value drops fast. If I were starting today I'd cap at 8 and add the rest only when I genuinely felt the gap.
The thing I'd keep. The plan-first rule. Every time I've been tempted to let an agent execute without a plan, I've regretted it within a week. The plan isn't friction. The plan is the product.
Email me directly. No newsletter, no autoresponder, just a one-line note from me when it ships.
[email protected]20 minutes, no pitch. We'll talk through what your version of this would look like and whether it's the right move right now.
Book a 20-min call →An AI growth engine is a system of named, scoped AI agents that handle marketing and growth operations (lifecycle email, SEO, support analytics, funnel diagnostics, revenue reporting) under a single orchestrator, with a human approving every customer-facing change. It is not a single chatbot and not a workflow tool. It is closer to an in-house growth team that costs the price of one software subscription.
Yes, with two non-negotiables. First, no agent sends to a real recipient list without a human-reviewed plan. Second, agents work within the same access boundaries you would set for a teammate. With those two rules in place, the Growth Engine on this page has run production lifecycle email, SEO, and support analytics for a $2M CAD+ SaaS.
Workflows execute rules you wrote in advance. Agents read context, draft proposals, and push back when something looks off. The two layers complement each other: in this stack, agents propose and approve the change, and a workflow does the actual sending. You don't replace your workflow tool. You add a judgment layer on top of it.
The Anthropic spend per Claude Max 20x plan is about $280 CAD / month. We run a couple of plans and rotate between them. Other tools in the stack (SendGrid, Intercom, GSC, GA4, Stripe, etc.) are separate and not specific to the agent layer. Even doubled up, the total Anthropic spend stays well under a single junior hire.
It helps and is not strictly required. The repo is Python and Markdown. If you can edit a config file, write a clear instruction in plain English, and run a script from the command line, you have the floor. Claude Code does the rest. The harder skill is judgment about what you'd ask the agents to do, which is a growth-thinking skill, not a coding one.
The current shape took about 90 days from the first agent file to a 12-agent system running production work. The first usable version of two agents took a weekend. The leverage compounds: each new agent is faster to spin up than the last because the scaffolding is in place.
No. The repo will include the agent definitions, modules, scripts, and the operating rules. Credentials and anything sensitive are gitignored and gitleaks-scanned. The patterns are public; the data stays private. That's a permanent boundary, not a v1 limitation.