What a Portfolio of AI Agents Actually Costs

Apr 21, 2026

Almost every week, a founder or operator in my network sends me some version of the same message. "We're planning to build a few AI agents for the business. What's this actually going to cost?"

The answers they get online are useless. Vendor marketing either understates costs by ignoring everything except API tokens, or overstates them by modeling a hypothetical enterprise deployment nobody is actually building. The real cost of a realistic portfolio — three to five custom agents aligned to actual business workflows — sits somewhere in the middle, and the gap between the two is where most budgeting mistakes happen.

This article is the honest version. I'll walk through a realistic portfolio, break down the actual cost categories (API tokens are roughly a third of it), and compare what the same portfolio looks like on Claude, OpenAI, and self-hosted Llama.

The scenario: five agents across one mid-sized business

Let's use a plausible mid-market company: 150 employees, professional services, growing. Their leadership team decides to build five agents over twelve months.

Agent	Purpose	Daily volume	Primary model need
Customer support triage	Classifies and routes incoming tickets	2,000 tickets	Lightweight
Proposal drafting	Generates first-draft client proposals	40 proposals	Heavyweight
Sales research	Researches prospects before calls	100 lookups	Mid-tier
Internal knowledge	Answers staff questions from internal docs	500 questions	Mid-tier
Finance ops	Categorizes invoices, flags anomalies	300 documents	Lightweight

This is realistic. Not toy scale. Not enterprise scale.

The six cost categories nobody itemizes properly

Before comparing stacks, you need to understand what you're actually paying for. API tokens are the visible cost. Five others matter.

Category	What it covers	Typical % of total
Model inference (API tokens)	Actual per-call charges to the model provider	30–40%
Agent runtime infrastructure	Sandboxing, state, credentials, logging	10–15%
Development & integration	Building, connecting to your systems	25–35%
Data pipelines & vector stores	Embeddings, storage, retrieval	5–10%
Observability & guardrails	Monitoring, evals, policy enforcement	5–10%
Ongoing human oversight	PM + domain expert time post-launch	10–15%

Most "it's just $X in API costs" estimates are off by 2–3x because they only count the first line. When a CFO asks what agents cost, show them this table first. It reframes the conversation from "what's the API bill?" to "what's the total delivery cost?"

Stack 1: Claude (Anthropic)

Claude's current pricing is the cleanest to model because Anthropic has stabilized its tiers. Haiku 4.5 sits at $1 per million input tokens and $5 per million output. Sonnet 4.6 at $3/$15. Opus 4.7 at $5/$25. Prompt caching cuts cached input cost by 90%. Batch API cuts everything else by 50% for non-real-time workloads.

For the portfolio above, I'd route: support triage and finance ops to Haiku, sales research and internal knowledge to Sonnet, and proposal drafting to Opus (the only agent where reasoning depth earns the premium).

Running the math on realistic token volumes:

Agent	Model	Monthly tokens (input/output)	Monthly cost
Support triage	Haiku 4.5	60M / 10M	~$110
Proposal drafting	Opus 4.7	24M / 6M	~$270
Sales research	Sonnet 4.6	30M / 5M	~$165
Internal knowledge	Sonnet 4.6	45M / 7M	~$240
Finance ops	Haiku 4.5	18M / 3M	~$33
Total inference			~$818/mo

Add Managed Agents runtime at $0.08 per session-hour. Assuming modest concurrency, budget around $400/month. Prompt caching and batch discounts can bring the full inference bill down meaningfully if implemented well — call it $900–1,300/month all-in for the Claude stack's direct costs.

Strengths: Cleanest pricing model, 1M-token context at flat rates on Sonnet and Opus, strongest agent tooling out of the box, Managed Agents handles runtime so your team doesn't.

Trade-offs: Opus is not the cheapest flagship on a headline-rate basis. The savings come from caching and tiered routing, which require discipline.

Stack 2: OpenAI (GPT-5 family)

GPT-5.4 sits at $2.50 per million input / $15.00 per million output, with cached input at $1.25/M. GPT-5 Mini at $0.25/$2.00 per million, GPT-5 Nano even cheaper. Batch API offers 50% off, matching Anthropic.

For the same portfolio, routing would be: triage and finance to Nano, research and knowledge to Mini, proposals to GPT-5.4.

Agent	Model	Monthly tokens	Monthly cost
Support triage	GPT-5 Nano	60M / 10M	~$30
Proposal drafting	GPT-5.4	24M / 6M	~$150
Sales research	GPT-5 Mini	30M / 5M	~$18
Internal knowledge	GPT-5 Mini	45M / 7M	~$25
Finance ops	GPT-5 Nano	18M / 3M	~$10
Total inference			~$233/mo

This is the part the marketing emphasizes. OpenAI's low end is dramatically cheaper per token than Claude's low end. Add infrastructure — you'll build your own agent runtime on top of the Assistants API, Responses API, or a framework like LangChain — and budget another $300–500/month in cloud and tooling costs.

Strengths: Cheapest raw token pricing at the low end, widest model selection, deepest integration ecosystem.

Trade-offs: GPT-5.4's 1M context window has a 272K threshold — input tokens above that double in cost. Agent infrastructure is largely your problem; you're buying the model, not the full stack. Reasoning token billing in Thinking modes can surprise you in budgeting.

Stack 3: Self-hosted Llama (open-source)

This is the path most teams underestimate, in both directions. It's often cheaper at scale than anyone expects, and often more expensive at small scale than anyone warned.

Hosted Llama 3.3 70B via Groq runs $0.59 input / $0.79 output per million tokens. Together AI around $0.88/M blended. AWS Bedrock at $0.72. Llama 4 Maverick via hosted providers around $0.15/$0.60.

For the same portfolio on hosted Llama:

Agent	Model	Approx. monthly cost
Support triage	Llama 3.1 8B (Groq)	~$10
Proposal drafting	Llama 4 Maverick	~$40
Sales research	Llama 3.3 70B	~$30
Internal knowledge	Llama 3.3 70B	~$40
Finance ops	Llama 3.1 8B	~$5
Total inference		~$125/mo

That's the cheap version. Now add the real cost: engineering time.

Self-hosting breakeven sits around 50 million tokens per month minimum, and requires sustained GPU utilization above 60%. Below that, the API wins. Our portfolio is right at the margin. More importantly, you need MLOps capacity — someone who handles deployment, monitoring, scaling, model updates, and security. That's a real person, not a line item.

Strengths: Lowest marginal cost at sustained scale. Complete control over data residency and fine-tuning. No vendor lock-in.

Trade-offs: You absorb infrastructure cost regardless of usage. Quality gap to frontier models, especially on complex agentic workflows. Tool-use reliability is meaningfully weaker than Claude or GPT-5.4. The real cost isn't tokens; it's engineering time.

The full picture: what this portfolio actually costs to build

Now let's add the five categories that never show up in API pricing articles.

Cost category	Claude stack	OpenAI stack	Llama self-host
Model inference (monthly)	~$900	~$500	~$125
Runtime infrastructure	Managed (~$400)	Build your own (~$400)	DIY (~$2,500+)
Initial build (one-time)	$45K–75K	$50K–85K	$85K–130K
Vector store / data plumbing	~$200/mo	~$200/mo	~$300/mo
Observability & evals	~$300/mo	~$300/mo	~$400/mo
PM + domain expert time	~$4,000/mo	~$4,000/mo	~$4,000/mo
Year-one total (rough)	$110K–160K	$115K–170K	$175K–245K

A few things worth sitting with.

API costs are 5–10% of total year-one cost. That's why "how much do the tokens cost?" is the wrong first question. The right first question is "who's building and maintaining this?"

The Llama path is cheapest in tokens and most expensive in total. This reverses at genuine scale — past ~200 million tokens per month with sustained utilization — but most mid-market portfolios never get there. Self-hosting at small scale is an engineering choice, not a cost one.

The Claude and OpenAI stacks are closer than the headline numbers suggest. OpenAI wins on raw tokens, Claude wins on bundled runtime, and the total-cost difference is small enough to make the decision on other factors: agent tooling quality, data residency, security posture, integration depth with the rest of your stack.

What this means for how you budget

Three concrete recommendations for anyone planning a portfolio build.

First, don't let procurement drive the stack decision on API pricing alone. The model cost is the smallest variable in your total. Prioritize the stack that ships agents faster with less engineering lift — that's where the real money goes.

Second, budget for model routing, not model lock-in. The most cost-efficient portfolios use two or three models per stack, routing simple tasks to small models and complex ones to flagship. A 70/20/10 split (small/mid/flagship) can cut inference costs by more than half versus running everything on flagship.

Third, factor the human oversight cost honestly. The agents still need a PM and a domain expert post-launch, reviewing outputs, refining skills, updating guardrails. Teams that treat this as zero-cost end up with degrading quality in month six.

The honest bottom line

For a five-agent portfolio at the mid-market scale described above, expect year-one costs in the range of $110K–170K on a Claude or OpenAI stack, $175K–245K on self-hosted Llama. The delta between Claude and OpenAI is small enough that fit-to-business should drive the choice. The delta to self-hosted Llama is large enough that it's only worth considering at genuine volume or with specific data residency requirements.

If someone quotes you $15K for a custom agent build or $3K/month for a portfolio, they're either solving a much smaller problem than you're describing or leaving 80% of the real cost out of the conversation. Both happen. Neither helps.

The right question isn't "how cheap can we make this?" It's "what's the smallest portfolio that earns its cost back in the first year?" For most mid-market businesses, that's one or two agents done well — not five done quickly. Start there. Scale once the first one has paid for itself.

View All