Skip to main content

Providers

CostKey auto-detects 15 AI providers by matching the URL hostname of outgoing HTTP requests. No configuration needed — if you call any of these APIs, CostKey captures it.

Supported providers

ProviderHostname(s)Auto-detectedNotes
OpenAIapi.openai.comYesIncludes GPT-4o, o1, o3, DALL-E, Whisper, Embeddings
Anthropicapi.anthropic.comYesClaude models. Cache tokens tracked (read + creation)
Googlegenerativelanguage.googleapis.com, *-aiplatform.googleapis.comYesGemini, PaLM, Vertex AI
Azure OpenAI*.openai.azure.comYesAzure-hosted OpenAI models
Groqapi.groq.comYesFast inference (Llama, Mixtral, Gemma)
xAIapi.x.ai, api.grok.xai.comYesGrok models
Mistralapi.mistral.aiYesMistral, Mixtral, Codestral
DeepSeekapi.deepseek.comYesDeepSeek-V3, DeepSeek-R1. Reasoning + cache tokens tracked
Cohereapi.cohere.comYesCommand, Embed, Rerank
Together AIapi.together.xyzYesOpen-source model hosting
Fireworks AIapi.fireworks.aiYesFast open-source inference
Perplexityapi.perplexity.aiYesSonar models
Cerebrasapi.cerebras.aiYesFastest inference (Llama)
OpenRouteropenrouter.aiYesMulti-provider router
AWS Bedrockbedrock-runtime.*.amazonaws.comYesClaude, Titan, Llama on AWS

How auto-detection works

On every fetch call (TypeScript) or httpx/requests call (Python), the SDK checks the request URL's hostname against the provider list. The check is a simple string match — it runs in microseconds and adds zero measurable overhead to non-AI calls.

When a match is found, the SDK:

  1. Captures a stack trace (for call site attribution)
  2. Records the start time
  3. Passes the request through to the provider unmodified
  4. Reads the response (cloned) to extract token usage
  5. Sends a CostKey event asynchronously

For streaming responses, the SDK wraps the response stream to capture timing metrics (TTFT, tokens/sec) and extracts usage from the final SSE chunk.

Server-side cost calculation

The SDK sends raw token counts to the CostKey server. The server calculates costs using a fuzzy model matching system:

  • claude-sonnet-4-5-20250514 matches pricing for claude-sonnet-4-5
  • ft:gpt-4o:my-org:custom:abc matches pricing for gpt-4o
  • accounts/fireworks/models/llama-v3p1-70b-instruct matches llama-3.1-70b

This means you never need to update the SDK when prices change — the server pricing table is updated independently.

Adding a custom provider

If you use an AI provider not in the built-in list, register a custom extractor:

import costkey
from costkey.types import Provider, NormalizedUsage

class MyProviderExtractor:
provider = Provider.UNKNOWN

def match(self, url: str) -> bool:
from urllib.parse import urlparse
return urlparse(url).hostname == "api.myprovider.com"

def extract_usage(self, body: dict) -> NormalizedUsage | None:
usage = body.get("usage")
if not usage:
return None
return NormalizedUsage(
input_tokens=usage.get("prompt_tokens"),
output_tokens=usage.get("completion_tokens"),
total_tokens=usage.get("total_tokens"),
)

def extract_model(self, request_body, response_body) -> str | None:
if isinstance(response_body, dict):
return response_body.get("model")
if isinstance(request_body, dict):
return request_body.get("model")
return None

costkey.register_extractor(MyProviderExtractor())

Provider-specific notes

OpenAI

CostKey injects stream_options: { include_usage: true } into streaming requests so that the final SSE chunk includes token usage. This is done transparently — your code doesn't need to change.

Anthropic

Cache tokens are tracked separately:

  • cacheReadTokens / cache_read_tokens — tokens read from Anthropic's prompt cache
  • cacheCreationTokens / cache_creation_tokens — tokens written to cache

DeepSeek

Reasoning tokens (from DeepSeek-R1) are captured in the reasoningTokens field. Cache hit/miss tokens are also tracked.

AWS Bedrock

Bedrock URLs follow the pattern bedrock-runtime.<region>.amazonaws.com. All regions are auto-detected. The SDK extracts usage from Bedrock's response format, which wraps the underlying model's response.

Google / Vertex AI

Both the Generative Language API (generativelanguage.googleapis.com) and Vertex AI endpoints (*-aiplatform.googleapis.com) are detected.