What a Portfolio of AI Agents Actually Costs

Almost every week, a founder or operator in my network sends me some version of the same message. "We're planning to build a few AI agents for the business. What's this actually going to cost?"
The answers they get online are useless. Vendor marketing either understates costs by ignoring everything except API tokens, or overstates them by modeling a hypothetical enterprise deployment nobody is actually building. The real cost of a realistic portfolio — three to five custom agents aligned to actual business workflows — sits somewhere in the middle, and the gap between the two is where most budgeting mistakes happen.
This article is the honest version. I'll walk through a realistic portfolio, break down the actual cost categories (API tokens are roughly a third of it), and compare what the same portfolio looks like on Claude, OpenAI, and self-hosted Llama.
The scenario: five agents across one mid-sized business
Let's use a plausible mid-market company: 150 employees, professional services, growing. Their leadership team decides to build five agents over twelve months.
Agent | Purpose | Daily volume | Primary model need |
|---|---|---|---|
Customer support triage | Classifies and routes incoming tickets | 2,000 tickets | Lightweight |
Proposal drafting | Generates first-draft client proposals | 40 proposals | Heavyweight |
Sales research | Researches prospects before calls | 100 lookups | Mid-tier |
Internal knowledge | Answers staff questions from internal docs | 500 questions | Mid-tier |
Finance ops | Categorizes invoices, flags anomalies | 300 documents | Lightweight |
This is realistic. Not toy scale. Not enterprise scale.
The six cost categories nobody itemizes properly
Before comparing stacks, you need to understand what you're actually paying for. API tokens are the visible cost. Five others matter.
Category | What it covers | Typical % of total |
|---|---|---|
Model inference (API tokens) | Actual per-call charges to the model provider | 30–40% |
Agent runtime infrastructure | Sandboxing, state, credentials, logging | 10–15% |
Development & integration | Building, connecting to your systems | 25–35% |
Data pipelines & vector stores | Embeddings, storage, retrieval | 5–10% |
Observability & guardrails | Monitoring, evals, policy enforcement | 5–10% |
Ongoing human oversight | PM + domain expert time post-launch | 10–15% |
Most "it's just $X in API costs" estimates are off by 2–3x because they only count the first line. When a CFO asks what agents cost, show them this table first. It reframes the conversation from "what's the API bill?" to "what's the total delivery cost?"
Stack 1: Claude (Anthropic)
Claude's current pricing is the cleanest to model because Anthropic has stabilized its tiers. Haiku 4.5 sits at $1 per million input tokens and $5 per million output. Sonnet 4.6 at $3/$15. Opus 4.7 at $5/$25. Prompt caching cuts cached input cost by 90%. Batch API cuts everything else by 50% for non-real-time workloads.
For the portfolio above, I'd route: support triage and finance ops to Haiku, sales research and internal knowledge to Sonnet, and proposal drafting to Opus (the only agent where reasoning depth earns the premium).
Running the math on realistic token volumes:
Agent | Model | Monthly tokens (input/output) | Monthly cost |
|---|---|---|---|
Support triage | Haiku 4.5 | 60M / 10M | ~$110 |
Proposal drafting | Opus 4.7 | 24M / 6M | ~$270 |
Sales research | Sonnet 4.6 | 30M / 5M | ~$165 |
Internal knowledge | Sonnet 4.6 | 45M / 7M | ~$240 |
Finance ops | Haiku 4.5 | 18M / 3M | ~$33 |
Total inference | ~$818/mo |
Add Managed Agents runtime at $0.08 per session-hour. Assuming modest concurrency, budget around $400/month. Prompt caching and batch discounts can bring the full inference bill down meaningfully if implemented well — call it $900–1,300/month all-in for the Claude stack's direct costs.
Strengths: Cleanest pricing model, 1M-token context at flat rates on Sonnet and Opus, strongest agent tooling out of the box, Managed Agents handles runtime so your team doesn't.
Trade-offs: Opus is not the cheapest flagship on a headline-rate basis. The savings come from caching and tiered routing, which require discipline.
Stack 2: OpenAI (GPT-5 family)
GPT-5.4 sits at $2.50 per million input / $15.00 per million output, with cached input at $1.25/M. GPT-5 Mini at $0.25/$2.00 per million, GPT-5 Nano even cheaper. Batch API offers 50% off, matching Anthropic.
For the same portfolio, routing would be: triage and finance to Nano, research and knowledge to Mini, proposals to GPT-5.4.
Agent | Model | Monthly tokens | Monthly cost |
|---|---|---|---|
Support triage | GPT-5 Nano | 60M / 10M | ~$30 |
Proposal drafting | GPT-5.4 | 24M / 6M | ~$150 |
Sales research | GPT-5 Mini | 30M / 5M | ~$18 |
Internal knowledge | GPT-5 Mini | 45M / 7M | ~$25 |
Finance ops | GPT-5 Nano | 18M / 3M | ~$10 |
Total inference | ~$233/mo |
This is the part the marketing emphasizes. OpenAI's low end is dramatically cheaper per token than Claude's low end. Add infrastructure — you'll build your own agent runtime on top of the Assistants API, Responses API, or a framework like LangChain — and budget another $300–500/month in cloud and tooling costs.
Strengths: Cheapest raw token pricing at the low end, widest model selection, deepest integration ecosystem.
Trade-offs: GPT-5.4's 1M context window has a 272K threshold — input tokens above that double in cost. Agent infrastructure is largely your problem; you're buying the model, not the full stack. Reasoning token billing in Thinking modes can surprise you in budgeting.
Stack 3: Self-hosted Llama (open-source)
This is the path most teams underestimate, in both directions. It's often cheaper at scale than anyone expects, and often more expensive at small scale than anyone warned.
Hosted Llama 3.3 70B via Groq runs $0.59 input / $0.79 output per million tokens. Together AI around $0.88/M blended. AWS Bedrock at $0.72. Llama 4 Maverick via hosted providers around $0.15/$0.60.
For the same portfolio on hosted Llama:
Agent | Model | Approx. monthly cost |
|---|---|---|
Support triage | Llama 3.1 8B (Groq) | ~$10 |
Proposal drafting | Llama 4 Maverick | ~$40 |
Sales research | Llama 3.3 70B | ~$30 |
Internal knowledge | Llama 3.3 70B | ~$40 |
Finance ops | Llama 3.1 8B | ~$5 |
Total inference | ~$125/mo |
That's the cheap version. Now add the real cost: engineering time.
Self-hosting breakeven sits around 50 million tokens per month minimum, and requires sustained GPU utilization above 60%. Below that, the API wins. Our portfolio is right at the margin. More importantly, you need MLOps capacity — someone who handles deployment, monitoring, scaling, model updates, and security. That's a real person, not a line item.
Strengths: Lowest marginal cost at sustained scale. Complete control over data residency and fine-tuning. No vendor lock-in.
Trade-offs: You absorb infrastructure cost regardless of usage. Quality gap to frontier models, especially on complex agentic workflows. Tool-use reliability is meaningfully weaker than Claude or GPT-5.4. The real cost isn't tokens; it's engineering time.
The full picture: what this portfolio actually costs to build
Now let's add the five categories that never show up in API pricing articles.
Cost category | Claude stack | OpenAI stack | Llama self-host |
|---|---|---|---|
Model inference (monthly) | ~$900 | ~$500 | ~$125 |
Runtime infrastructure | Managed (~$400) | Build your own (~$400) | DIY (~$2,500+) |
Initial build (one-time) | $45K–75K | $50K–85K | $85K–130K |
Vector store / data plumbing | ~$200/mo | ~$200/mo | ~$300/mo |
Observability & evals | ~$300/mo | ~$300/mo | ~$400/mo |
PM + domain expert time | ~$4,000/mo | ~$4,000/mo | ~$4,000/mo |
Year-one total (rough) | $110K–160K | $115K–170K | $175K–245K |
A few things worth sitting with.
API costs are 5–10% of total year-one cost. That's why "how much do the tokens cost?" is the wrong first question. The right first question is "who's building and maintaining this?"
The Llama path is cheapest in tokens and most expensive in total. This reverses at genuine scale — past ~200 million tokens per month with sustained utilization — but most mid-market portfolios never get there. Self-hosting at small scale is an engineering choice, not a cost one.
The Claude and OpenAI stacks are closer than the headline numbers suggest. OpenAI wins on raw tokens, Claude wins on bundled runtime, and the total-cost difference is small enough to make the decision on other factors: agent tooling quality, data residency, security posture, integration depth with the rest of your stack.
What this means for how you budget
Three concrete recommendations for anyone planning a portfolio build.
First, don't let procurement drive the stack decision on API pricing alone. The model cost is the smallest variable in your total. Prioritize the stack that ships agents faster with less engineering lift — that's where the real money goes.
Second, budget for model routing, not model lock-in. The most cost-efficient portfolios use two or three models per stack, routing simple tasks to small models and complex ones to flagship. A 70/20/10 split (small/mid/flagship) can cut inference costs by more than half versus running everything on flagship.
Third, factor the human oversight cost honestly. The agents still need a PM and a domain expert post-launch, reviewing outputs, refining skills, updating guardrails. Teams that treat this as zero-cost end up with degrading quality in month six.
The honest bottom line
For a five-agent portfolio at the mid-market scale described above, expect year-one costs in the range of $110K–170K on a Claude or OpenAI stack, $175K–245K on self-hosted Llama. The delta between Claude and OpenAI is small enough that fit-to-business should drive the choice. The delta to self-hosted Llama is large enough that it's only worth considering at genuine volume or with specific data residency requirements.
If someone quotes you $15K for a custom agent build or $3K/month for a portfolio, they're either solving a much smaller problem than you're describing or leaving 80% of the real cost out of the conversation. Both happen. Neither helps.
The right question isn't "how cheap can we make this?" It's "what's the smallest portfolio that earns its cost back in the first year?" For most mid-market businesses, that's one or two agents done well — not five done quickly. Start there. Scale once the first one has paid for itself.











