HomeBenchmarksOperations › Cross-System Task Execution
Operations

How much does an AI agent cost to run Cross-System Task Execution?

Token cost benchmark for an autonomous Cross-System Task Execution agent, across 13 models. Prices as of 14 Jun 2026.

An agent for Cross-System Task Execution on the clean path costs about $0.0141 to $0.987 per outcome depending on the model, around 13x the cost of a single chat message. At 10,000 outcomes a month that is roughly $141 to $9,870.
Estimate your own numbers →

Cost per outcome by model

Model$/1M in$/1M outCost / outcomeCost / month*
GPT-4o mini$0.15$0.60$0.0141$141
Llama 4 Maverick$0.27$0.85$0.0242$242
Gemini 2.5 Flash$0.30$2.50$0.0344$344
DeepSeek V4$0.44$0.87$0.0370$370
GPT-4.1 mini$0.40$1.60$0.0376$376
Claude Haiku 4.5$1.00$5.00$0.0987$987
Gemini 2.5 Pro$1.25$10.00$0.141$1,414
Mistral Large 3$2.00$6.00$0.178$1,782
GPT-4.1$2.00$8.00$0.188$1,878
GPT-4o$2.50$10.00$0.235$2,348
Claude Sonnet 4.6$3.00$15.00$0.296$2,961
Claude Opus 4.8$5.00$25.00$0.494$4,935
Claude Fable 5$10.00$50.00$0.987$9,870

*At 10,000 outcomes per month. Cheapest model highlighted.

What this agent does

The clean-path steps this benchmark prices:

  1. Resolve Targets
  2. Authorised scope?
  3. High-impact / irreversible?
  4. Execute Across Systems
  5. All steps succeeded?
  6. Verify & Reconcile

What drives the cost

This path runs 6 steps: 2 tool calls, 1 reasoning step, 3 decision points and 0 human checkpoints. Tool steps make two model calls each, and the agent re-reads its growing context on every call. That compounding is why one Cross-System Task Execution outcome costs about 13x a single chat message ($0.296 on Claude Sonnet 4.6), not the price of one message.

Why these numbers matter.

Frequently asked questions

How much does an AI agent cost to run Cross-System Task Execution?

On the clean path with default assumptions, an agent for Cross-System Task Execution costs about $0.0141 to $0.987 per outcome depending on the model, or roughly $141 to $9,870 per month at 10,000 outcomes. The cheapest model here is GPT-4o mini at $0.0141; the most expensive is Claude Fable 5 at $0.987.

Why does an AI agent cost more than a single chatbot message?

An agent does not make one model call. It plans, calls tools, retrieves context and re-reads its growing working context on every step. For Cross-System Task Execution that adds up to about 13x the cost of a single chat message.

Which model is cheapest for Cross-System Task Execution?

Across the 13 models benchmarked, GPT-4o mini is cheapest at $0.0141 per outcome and Claude Fable 5 is the most expensive at $0.987. A cheaper model is not always the right choice, but it sets the floor for this workflow.

How can I reduce the cost of an agent for Cross-System Task Execution?

The biggest levers are prompt caching on the base context, fewer planning loops, smaller tool results, less retrieval, and choosing a cheaper model where quality allows. You can test each lever in the live estimator.

More Operations benchmarks

Open Cross-System Task Execution in the live estimator →