built throughORANGEBOX·see what it ships·$1 →
A matte-black aluminum heatsink with a single bio-cyan LED — where inference actually runs.

AtomEons / Learn / inference-providers

Inference providers — who hosts what, and at what price

A working index of where you can rent a model — Anthropic, OpenAI, Google, the open-weight hosts, and the cloud platforms behind them

There are two kinds of inference provider, and confusing them costs real money. The first kind hosts its own models. Anthropic serves Claude. OpenAI serves GPT. Google serves Gemini. These are first-party endpoints — the model maker is the only seller, the pricing page is the source of truth, and a Business Associate Agreement (when offered) comes directly from the lab. The second kind hosts other people's open-weight models. Together, Fireworks, Groq, Cerebras, SambaNova, Replicate, Modal, and Lambda all do some version of this: take a published checkpoint (Llama, Qwen, DeepSeek, GPT-OSS, Gemma, Mistral), put it on their hardware, expose an API. Same model, different host, different price, different throughput, different compliance posture. A Llama 4 token from Groq and a Llama 4 token from Together are the same model output billed by two different companies. Cloud platforms add a third layer. AWS Bedrock and Google Vertex resell first-party models (Claude on Bedrock, Claude on Vertex, Gemini on Vertex) alongside open weights, with their own SLAs and BAAs. Azure does the same for OpenAI. OpenRouter sits a tier above everything as a single endpoint that routes to whichever provider you select — they explicitly do not mark up provider rates, charging only a platform fee on pay-as-you-go. This page indexes the major providers as of June 2026 with pricing pulled from official pages. Pricing in this market moves quarterly. Always check the provider's pricing page before sizing a budget — the URLs in the citations below are the canonical sources. Where a number is missing from public docs (e.g., per-region pricing details, current BAA scope), this page says so explicitly rather than guess.

First-party vs third-party — the distinction that actually matters

First-party means the lab that trained the model is the one selling inference. Anthropic sells Claude. OpenAI sells GPT. Google sells Gemini through both AI Studio (the developer surface) and Vertex AI (the enterprise surface). The pricing page on the lab's domain is authoritative — if a third party quotes you a different number for the same first-party model, something is wrong. Third-party means a host that has taken a publicly released open-weight model (Meta's Llama, Alibaba's Qwen, DeepSeek's V3 and R1 lineages, Mistral's open releases, Google's Gemma, OpenAI's GPT-OSS) and is running it on its own hardware. The same model checkpoint can appear on five hosts with five different price points and five different latency profiles. Together, Fireworks, Groq, Cerebras, SambaNova, and DeepInfra are the most visible names here. There is no proprietary lab relationship — the host is competing on hardware, scheduler, quantization choices, and price. A few hybrid cases are worth flagging. Replicate runs open-weight LLMs and also resells some closed-model APIs (Claude, for example, appears as a passthrough). Hugging Face Inference Endpoints lets you deploy any model from the Hub on rented GPUs — closer to managed hosting than to a token marketplace. Modal and Anyscale are general GPU compute platforms that people use to serve their own model deployments. AWS Bedrock and Vertex AI host first-party models from multiple labs under a single cloud contract. OpenRouter is purely a router. Knowing which kind a provider is tells you who controls the model version, who signs the BAA, and whether 'the same Claude' really is the same Claude. (Spoiler: a Claude token from the Claude API, Bedrock, and Vertex are all served by Anthropic infrastructure or Anthropic-managed deployment — they bill differently, but the model is the same model.)

The index — major inference providers as of June 2026

Pricing is per million tokens (MTok) in USD, pulled from each provider's public pricing page. First-party hosts list flagship and budget tiers. Third-party hosts list one or two representative open-weight prices — they all have full catalogs. Always verify on the provider's live pricing page before committing a budget; this market reprices quarterly.

ProviderAnthropic API
TypeFirst-party (Claude)
Representative modelClaude Sonnet 4.6
Input $/MTok$3.00
Output $/MTok$15.00
Free tierSmall starter credits
BAA / HIPAAHIPAA-ready offering (enterprise)
ProviderAnthropic API
TypeFirst-party (Claude)
Representative modelClaude Opus 4.7
Input $/MTok$5.00
Output $/MTok$25.00
Free tierSame starter credits
BAA / HIPAAHIPAA-ready offering (enterprise)
ProviderAnthropic API
TypeFirst-party (Claude)
Representative modelClaude Haiku 4.5
Input $/MTok$1.00
Output $/MTok$5.00
Free tierSame starter credits
BAA / HIPAAHIPAA-ready offering (enterprise)
ProviderOpenAI API
TypeFirst-party (GPT)
Representative modelGPT-5.4
Input $/MTok$2.50
Output $/MTok$15.00
Free tierLimited eval credits historically
BAA / HIPAABAA available — check current scope
ProviderOpenAI API
TypeFirst-party (GPT)
Representative modelGPT-5.4-mini
Input $/MTok$0.75
Output $/MTok$4.50
Free tierSame
BAA / HIPAASame
ProviderGoogle AI Studio
TypeFirst-party (Gemini)
Representative modelGemini 2.5 Flash
Input $/MTok$0.30
Output $/MTok$2.50
Free tierFree tier (data used for product improvement)
BAA / HIPAAUse Vertex for BAA
ProviderGoogle AI Studio
TypeFirst-party (Gemini)
Representative modelGemini 3.5 Flash
Input $/MTok$1.50
Output $/MTok$9.00
Free tierFree tier (data used for product improvement)
BAA / HIPAAUse Vertex for BAA
ProviderVertex AI
TypeFirst-party + multi-lab
Representative modelGemini 2.5 Pro
Input $/MTok$1.25 (≤200k)
Output $/MTok$10.00
Free tierGCP free credits for new accounts
BAA / HIPAAGCP HIPAA-eligible with BAA
ProviderVertex AI
TypeFirst-party + multi-lab
Representative modelClaude Sonnet 4.5 (via Vertex)
Input $/MTokAnthropic rates + regional uplift
Output $/MTokSame
Free tierGCP free credits
BAA / HIPAAGCP HIPAA-eligible with BAA
ProviderAWS Bedrock
TypeMulti-lab resale
Representative modelClaude Sonnet on Bedrock
Input $/MTokAnthropic rates + regional uplift
Output $/MTokSame
Free tierAWS Free Tier credits
BAA / HIPAAAWS HIPAA-eligible with BAA
ProviderAWS Bedrock
TypeMulti-lab resale
Representative modelMistral Large 3
Input $/MTok$0.50
Output $/MTok$1.50
Free tierAWS Free Tier credits
BAA / HIPAAAWS HIPAA-eligible with BAA
ProviderAzure OpenAI
TypeFirst-party resale (GPT)
Representative modelGPT family via Azure
Input $/MTokAzure-listed (varies by region/tier)
Output $/MTokSame
Free tierAzure free account credits
BAA / HIPAAAzure HIPAA-eligible with BAA
ProviderGroq
TypeThird-party (open weights)
Representative modelLlama 3.3 70B Versatile
Input $/MTok$0.59
Output $/MTok$0.79
Free tierFree API key, rate-limited
BAA / HIPAANot publicly advertised — ask sales
ProviderGroq
TypeThird-party (open weights)
Representative modelLlama 3.1 8B Instant
Input $/MTok$0.05
Output $/MTok$0.08
Free tierSame
BAA / HIPAASame
ProviderCerebras
TypeThird-party (open weights)
Representative modelGPT-OSS-120B / Llama 4 Scout
Input $/MTokSee Cerebras pricing page
Output $/MTokSame
Free tierFree trial API access
BAA / HIPAASee Cerebras trust center
ProviderSambaNova Cloud
TypeThird-party (open weights)
Representative modelLlama 3.3 70B Instruct
Input $/MTokListed; tier varies
Output $/MTokSame
Free tierTrial credits referenced
BAA / HIPAANot publicly advertised
ProviderTogether AI
TypeThird-party (open weights)
Representative modelLlama 3 8B Instruct Lite
Input $/MTok$0.10
Output $/MTok$0.10
Free tierTrial credits ('start for free')
BAA / HIPAANot on pricing page — ask sales
ProviderTogether AI
TypeThird-party (open weights)
Representative modelPremium tier (e.g. GLM family)
Input $/MTokUp to $1.40
Output $/MTokUp to $4.40
Free tierSame
BAA / HIPAASame
ProviderFireworks AI
TypeThird-party (open weights)
Representative modelOpen-weight LLMs (see docs)
Input $/MTokListed in docs; embeddings from $0.008
Output $/MTokSame
Free tier$1 in free credits
BAA / HIPAANot on pricing page — ask sales
ProviderReplicate
TypeMixed (open + passthrough)
Representative modelDeepSeek and others (token-billed)
Input $/MTokPer-model (e.g., $3/MTok input on some)
Output $/MTokPer-model
Free tierLimited free runs historically
BAA / HIPAANot publicly advertised
ProviderPerplexity API
TypeSearch-augmented LLM
Representative modelSonar Pro
Input $/MTok$3.00
Output $/MTok$15.00
Free tierPaid only
BAA / HIPAANot on standard pricing page
ProviderPerplexity API
TypeSearch-augmented LLM
Representative modelSonar (entry)
Input $/MTok$1.00
Output $/MTok$1.00
Free tierPaid only
BAA / HIPAASame
ProviderOpenRouter
TypeRouter (no hosting)
Representative modelPass-through to 60+ providers
Input $/MTokProvider rate + 5.5% platform fee (PAYG)
Output $/MTokSame
Free tierFree plan: 25+ models, 50 req/day
BAA / HIPAAInherits provider posture
ProviderHugging Face Inference Endpoints
TypeManaged hosting
Representative modelAny Hub model on rented GPUs
Input $/MTokGPU/hour (T4 ~$0.50/hr to B200 ~$9.25/hr)
Output $/MTokSame
Free tierFree Spaces, Inference API trial
BAA / HIPAASOC 2 referenced for endpoints
ProviderModal
TypeServerless GPU compute
Representative modelBring-your-own model
Input $/MTokPer GPU-second (A100 ~$0.000694/sec)
Output $/MTokSame
Free tier$30/month free credits (Starter)
BAA / HIPAASOC 2 (Starter); HIPAA on Enterprise
ProviderAnyscale
TypeRay-based compute
Representative modelBring-your-own deployments
Input $/MTokPer GPU-hour (H100 ~$9.29/hr)
Output $/MTokSame
Free tier$100 starter credits
BAA / HIPAACheck Anyscale enterprise terms
ProviderLambda
TypeGPU cloud
Representative modelOn-demand GPU instances
Input $/MTokInference API winding down; GPU rental remains
Output $/MTokSame
Free tierVaries
BAA / HIPAACheck Lambda enterprise terms

Anthropic API — first-party Claude

Anthropic publishes a single pricing page for the Claude API. As of June 2026 the active tiers are Claude Opus (4.5, 4.6, 4.7, with 4.1 and 4 in deprecated state), Claude Sonnet (4.5 and 4.6), and Claude Haiku (4.5; 3.5 retired except on Bedrock and Vertex). Flagship Opus 4.5–4.7 is priced at $5 input / $25 output per million tokens. Sonnet 4.5–4.6 is $3 / $15. Haiku 4.5 is $1 / $5. Prompt caching is a real lever. A five-minute cache write costs 1.25x the base input rate, a one-hour write costs 2x, and a cache read costs 0.1x. Read once and the five-minute cache has paid for itself; read twice and the one-hour cache has paid for itself. Batch API gives 50% off both directions for asynchronous workloads. Data residency (US-only inference via the inference_geo parameter on Opus 4.6+) adds a 1.1x multiplier. Claude is also available through AWS Bedrock and Google Vertex. These are Anthropic-managed deployments billed by the cloud provider. Regional and multi-region endpoints carry a 10% premium versus the global endpoint. For HIPAA-bound workloads, Anthropic offers an enterprise HIPAA-ready configuration; for AWS- or GCP-bound compliance, the cloud platform's BAA governs.

OpenAI API and Azure OpenAI

OpenAI's first-party endpoint hosts the GPT family. As of June 2026 the visible flagship pricing on the developer pricing page lists GPT-5.5 at $5.00 input / $30.00 output per million tokens, GPT-5.4 at $2.50 / $15.00, GPT-5.4-mini at $0.75 / $4.50, and GPT-5.4-nano at $0.20 / $1.25. Cached input gets a 90% discount. Batch processing gets 50% off. Regional processing (data residency) carries a 10% uplift on models released on or after 5 March 2026. The Azure OpenAI Service is the Microsoft-hosted resale of the same model family, sold through Azure with Azure's compliance envelope (HIPAA-eligible under an Azure BAA). The pricing structure is similar but billed in Azure currency and with Azure region constraints; the canonical price reference is the Azure OpenAI Service pricing page. Treat the two surfaces as separate contracts: BAA via OpenAI directly for first-party use, BAA via Microsoft for Azure use.

Google — AI Studio and Vertex AI

Google has two doors into the Gemini family. AI Studio is the developer surface — free tier exists but data is used for product improvement, paid tier prices are listed per model. Vertex AI is the enterprise surface — GCP project, GCP IAM, GCP BAA. Vertex also resells Claude (with regional uplift) alongside Google's own Gemini line. As of June 2026, public pricing for Gemini 2.5 Flash on AI Studio is $0.30 input / $2.50 output per million tokens (text/image/video) with audio input at $1.00. Gemini 3.5 Flash is $1.50 / $9.00. Gemini 2.5 Flash-Lite is $0.10 / $0.40. On Vertex, Gemini 2.5 Pro is $1.25 input (≤200k) / $10.00 output, with input doubling above the 200k context threshold. Batch API offers 50% off standard pricing. For compliance: AI Studio is the wrong door for regulated workloads — the free tier explicitly trains on your data. Vertex is the correct door, with GCP's standard HIPAA-eligible posture under a Google Cloud BAA.

Third-party hosts of open-weight models

These providers host Llama, Qwen, DeepSeek, GPT-OSS, Gemma, and Mistral on their own hardware. The model checkpoint is the same across hosts; the price, latency, throughput, and compliance posture differ. Pricing below is illustrative — see the linked official pages in citations for full catalogs.

Groq

Llama 3.3 70B: $0.59 / $0.79 per MTok

LPU inference hardware. Optimized for very low time-to-first-token on open-weight LLMs. Public catalog includes Llama 3.x and Llama 4 variants, GPT-OSS 20B and 120B, Qwen3. Pricing is published per model. Free API key available with rate limits. Compliance certifications not publicly advertised on the pricing page — ask sales for the current posture.

Cerebras

Per-token rates on cerebras.ai/pricing

Wafer-scale inference. Hosts open-weight models including GPT-OSS-120B, Llama 4 Scout, and the GLM family. Tiered offering: free trial, self-serve Developer (add funds from $10), Enterprise. Per-token rates are listed on the pricing page rather than the inference landing page. Compliance posture is documented at the Cerebras trust center.

SambaNova Cloud

10 models on the pricing page

Reconfigurable dataflow hardware (RDU). Hosts DeepSeek (R1-Distill, V3.1, V3.2), Llama 3.3 70B, Llama 4 Maverick 17B, GPT-OSS-120B, Gemma 3 12B and 4 31B, MiniMax-M2.7. Per-million-token pricing is published per model — entry tier around $0.15 / $0.75, premium DeepSeek V3.x around $3.00 / $4.50. Trial credits referenced.

Together AI

25+ chat models, full price list public

Broad catalog of open-weight chat, image, and embedding models on standard GPU infrastructure. Serverless inference starts around $0.10 / $0.10 per MTok (Llama 3 8B Instruct Lite tier) and runs up to ~$1.40 / $4.40 for premium-tier models like GLM-5.1. 'Start for free' messaging on the landing page; compliance not on the pricing page itself.

Fireworks AI

$1 starter credit; cache and batch built in

Open-weight LLM hosting with cache and batch discounts (50% off cached input and batch inference by default). Embeddings from $0.008/MTok for small models. Per-model text and vision prices live in their docs rather than the pricing landing page. $1 free credit on signup.

Replicate

Hybrid: GPU-second, per-token, per-output

Mixed model: most models bill by hardware-seconds (GPU time × duration), some LLMs bill by token, image and video models bill per output unit. Useful when you want pinned model versions and per-version reproducibility. Free tier and compliance certifications not stated on the public pricing page.

Hugging Face Inference Endpoints

$0.033/hr starting; SOC 2 for endpoints

Closer to managed model hosting than a token marketplace. You deploy any Hub model to a dedicated endpoint on AWS, Azure, or GCP hardware and pay per GPU-hour. CPU instances from ~$0.03/hour. NVIDIA T4 from $0.50/hour. H100 from $4.50/hour. B200 from $9.25/hour. SOC 2 referenced for the Inference Endpoints product.

Modal

$30/mo free credits; SOC 2 (Starter), HIPAA (Enterprise)

Serverless GPU compute. You bring your own container; Modal handles cold starts, autoscaling, and per-second billing. A100 80GB at ~$0.000694/sec. H100 at ~$0.001097/sec. B200 at ~$0.001736/sec. Starter plan includes $30/month in free credits and SOC 2. Enterprise tier adds HIPAA and audit logs.

Anyscale

$100 starter credit; pay-as-you-go GPU

Ray-native compute platform. Pay-as-you-go GPU instances: T4 ~$0.57/hr, A100 ~$4.96/hr, H100 ~$9.29/hr, H200 ~$10.68/hr. $100 in starter credits for new accounts. Anyscale-hosted endpoints product has evolved — confirm the current managed-endpoint offering before committing.

Lambda

Inference API winding down — GPU rental active

GPU cloud. As of mid-2026 Lambda's inference API product is being wound down; the on-demand GPU instance offering remains. If you were planning to use Lambda Inference API, verify current status before integrating; the GPU rental product is still active.

Cloud platforms — Bedrock, Vertex, Azure

The three hyperscalers wrap multiple labs' models under their own enterprise contract. The advantage is one paper of record: the same MSA, the same BAA, the same SSO, the same VPC posture covers your usage of multiple model families. AWS Bedrock hosts Anthropic Claude, Meta Llama, Mistral, Amazon Titan, AI21 Jurassic, Cohere Command, and Stability image models. Claude on Bedrock is the same Claude as the Claude API, billed via AWS Marketplace; the regional endpoint variants for Claude 4.5 and later carry a 10% premium over the global endpoint. Bedrock is HIPAA-eligible under an AWS BAA. Google Vertex AI hosts Gemini (first-party), Claude (via the Anthropic relationship), Llama via Model Garden, and a long tail of open weights. Vertex is the enterprise door for HIPAA-bound Gemini workloads — AI Studio is not. The Gemini 2.5 Pro tier on Vertex is priced at $1.25 input / $10.00 output per million tokens with a step up above 200k context. Azure OpenAI Service is the Microsoft-resold form of OpenAI's GPT family with Azure's compliance envelope. Use it when you are already an Azure customer with an existing BAA; pricing maps closely to OpenAI's first-party rates but is governed by Azure region availability and Azure currency.

OpenRouter — the router layer

OpenRouter is not a host. It is a unified endpoint that routes your request to whichever provider serves the model you ask for, with automatic fallbacks if a primary provider degrades. The economics: free plan gives access to 25+ free models and 50 requests per day with no credit card. Pay-as-you-go adds a 5.5% platform fee on top of the underlying provider's rate. OpenRouter explicitly does not mark up provider pricing — what you see in the model catalog is the provider's actual rate. This makes OpenRouter useful for two things: cross-provider price discovery without holding accounts at all of them, and graceful fallback when one provider has an outage. The trade-off is that the compliance posture inherits from whichever underlying provider serves your request, so for regulated workloads you still need to pin to providers whose BAA you trust.

Perplexity API — search-augmented inference

Perplexity is structurally different from the rest of this list. Their Sonar family of API models is search-augmented — the model executes web queries against Perplexity's search infrastructure as part of generating a response, and you are billed for both the token usage and the requests. Sonar is $1.00 input / $1.00 output per million tokens. Sonar Pro is $3.00 / $15.00. Sonar Reasoning Pro and Sonar Deep Research are $2.00 / $8.00 with additional citation and reasoning token charges. Per-request fees apply on top of token pricing — between $5 and $14 per 1,000 requests depending on search context size and model. Their Agent API also brokers access to third-party models from OpenAI, Anthropic, Google, and xAI, which is useful when the value of the call is the search rather than the underlying base model. Use Perplexity when retrieval is the core of the workload; use a generic first-party API and your own retrieval stack when you need fine-grained control over the search step.

How to choose — a minimum-effective-dose rule

Most teams overspend on inference by defaulting to the most expensive flagship for every call. The cheap dominant strategy is to route most traffic to a cheap tier and reserve flagship for the small fraction of calls that actually need it. A working heuristic:

  • Default to the smallest model on the smallest budget that passes your eval. Haiku 4.5 ($1/$5), Gemini 2.5 Flash-Lite ($0.10/$0.40), or a small Llama variant on Groq ($0.05/$0.08) is the right first stop, not Opus.
  • Only escalate to flagship (Opus, GPT-5.5, Gemini 3.1 Pro) when the smaller tier visibly fails on real tasks, and only for the calls that need it. Route, don't replace.
  • Turn on prompt caching the moment your system prompt or shared context exceeds a few hundred tokens. Cache reads at 0.1x base input price are the single largest cost lever on long-context workloads.
  • Use batch APIs (50% off on Anthropic, OpenAI, and most third-party hosts) for anything that can tolerate asynchronous turnaround — bulk classification, eval grading, content backfills.
  • For regulated workloads, pick the cloud envelope first, then the model. AWS BAA + Bedrock + Claude is one paper; Anthropic enterprise HIPAA-ready is another; Azure BAA + Azure OpenAI is a third. Pick one and don't sprawl.
  • For raw cost discovery across the open-weight ecosystem, route through OpenRouter for a quarter and read the per-model usage report. You'll learn which hosts your workload actually likes before signing direct contracts.
  • Verify pricing the week you sign a contract. Every number on this page is dated June 2026 and was pulled from official sources; this market reprices on a quarterly cadence.

What this page does not promise

Three honest caveats. First: BAA / HIPAA / SOC 2 status is more subtle than a yes/no column can capture. Several providers (Together, Fireworks, Groq, SambaNova, Replicate) do not display compliance certifications on their public pricing pages even when they hold them — you have to ask sales. Treat 'not publicly advertised' as 'unknown without a sales call,' not as 'does not exist.' Second: this page does not list every available model at every provider. The catalogs are too large and they change weekly. The point of the index is to show the structure of the market and let you go to the canonical pricing page for the specific model you need. Third: latency, throughput, time-to-first-token, and reliability matter as much as $/MTok and are not captured here. Two providers can serve the same Llama 4 checkpoint at the same price and one of them will be three times faster in practice. Benchmark on your actual workload before you commit.

Sources

  1. [01]

    Claude API per-token pricing for Opus 4.5–4.7 ($5/$25), Sonnet 4.5–4.6 ($3/$15), and Haiku 4.5 ($1/$5), plus caching and batch multipliers.

    https://platform.claude.com/docs/en/about-claude/pricing
  2. [02]

    Anthropic's public pricing landing page, source of truth for Claude API rates and enterprise HIPAA-ready offering.

    https://claude.com/pricing
  3. [03]

    OpenAI API per-token pricing for GPT-5.5, 5.4, 5.4-mini, 5.4-nano, cached input discount, batch and regional uplift policy.

    https://developers.openai.com/api/docs/pricing
  4. [04]

    Google AI Studio pricing for Gemini 2.5 Flash, 2.5 Flash-Lite, 3.1 Flash-Lite, and 3.5 Flash, plus free tier data-use terms.

    https://ai.google.dev/pricing
  5. [05]

    Vertex AI Gemini pricing tiers including 2.5 Pro ($1.25/$10.00, step up above 200k) and 3.x Flash variants.

    https://cloud.google.com/vertex-ai/generative-ai/pricing
  6. [06]

    AWS Bedrock foundation model pricing across Anthropic Claude, Meta Llama, Mistral, and Amazon Titan with batch discount notes.

    https://aws.amazon.com/bedrock/pricing/
  7. [07]

    Together AI serverless inference pricing range from $0.10/$0.10 (Llama 3 8B Lite tier) to $1.40/$4.40 (GLM premium tier).

    https://www.together.ai/pricing
  8. [08]

    Groq per-model pricing for Llama 3.1 8B, Llama 3.3 70B, Llama 4 Scout, GPT-OSS 20B and 120B, Qwen3 32B.

    https://groq.com/pricing
  9. [09]

    Fireworks pricing structure with 50% cache and batch discounts and embedding tier pricing; $1 free credit on signup.

    https://fireworks.ai/pricing
  10. [10]

    SambaNova Cloud catalog of 10 open-weight models with separate input/output per-million-token pricing per model.

    https://cloud.sambanova.ai/pricing
  11. [11]

    Cerebras inference tiers (free trial, Developer self-serve from $10, Enterprise) and hosted models including GPT-OSS-120B and Llama 4 Scout.

    https://www.cerebras.ai/inference/
  12. [12]

    Replicate's hybrid pricing — hardware-time, per-token for LLMs, per-output for image/video models.

    https://replicate.com/pricing
  13. [13]

    OpenRouter free plan (25+ models, 50 req/day), pay-as-you-go with 5.5% platform fee, and explicit no-markup policy on provider rates.

    https://openrouter.ai/pricing
  14. [14]

    Perplexity Sonar pricing — Sonar ($1/$1), Sonar Pro ($3/$15), Sonar Reasoning Pro and Deep Research ($2/$8 + extras), plus per-request fees.

    https://docs.perplexity.ai/guides/pricing
  15. [15]

    Hugging Face Inference Endpoints hourly GPU rates (T4 ~$0.50/hr through B200 ~$9.25/hr) and SOC 2 reference.

    https://huggingface.co/pricing
  16. [16]

    Modal per-GPU-second pricing (A100 80GB ~$0.000694, H100 ~$0.001097, B200 ~$0.001736), $30/month Starter credits, SOC 2 and HIPAA-on-Enterprise.

    https://modal.com/pricing
  17. [17]

    Anyscale per-GPU-hour pricing (T4 ~$0.57, A100 ~$4.96, H100 ~$9.29, H200 ~$10.68) with $100 starter credits.

    https://www.anyscale.com/pricing
  18. [18]

    Lambda's Inference API is being wound down; the GPU instance rental product remains active.

    https://lambda.ai/inference
  19. [19]

    Azure OpenAI Service hosts the OpenAI GPT family under Microsoft's Azure compliance envelope (HIPAA-eligible with Azure BAA).

    https://azure.microsoft.com/en-us/products/ai-services/openai-service/
  20. [20]

    AWS HIPAA eligibility and BAA framework, which governs Bedrock usage for regulated workloads.

    https://aws.amazon.com/compliance/hipaa-compliance/
LAB · ATOMEONS · MARCO ISLAND FLÆONS RESEARCH · 12 PAPERS · CC-BY 4.0ORANGEBOX v1.0.0-beta · TURBO-OPTIMIZE CLAUDE · SHIPPED 2026-05-30B00KMAKR v3.2.0 · AI PUBLISHING COCKPIT · MAC + WINDOWSFREE LAUNCH WEEK · ENDS JUNE 6 · §4A NO-SAAS LOCKFOUNDER'S VIEW · NEXT BROADCAST IN ...CITE THE WORK · FORWARD THE LINK · NO ALGORITHMLAB · ATOMEONS · MARCO ISLAND FLÆONS RESEARCH · 12 PAPERS · CC-BY 4.0ORANGEBOX v1.0.0-beta · TURBO-OPTIMIZE CLAUDE · SHIPPED 2026-05-30B00KMAKR v3.2.0 · AI PUBLISHING COCKPIT · MAC + WINDOWSFREE LAUNCH WEEK · ENDS JUNE 6 · §4A NO-SAAS LOCKFOUNDER'S VIEW · NEXT BROADCAST IN ...CITE THE WORK · FORWARD THE LINK · NO ALGORITHM