Pricing | Synthetic

Synthetic can use either subscription or usage-based pricing. Choose the plan that works best for you.

Standard

$20/month

Perfect for individuals just starting out.

✓Access to all always-on models

✓Both UI and API access

✓Cancel anytime

✓Standard rate limits: 135 requests every five hours

✓3x higher rate limits than Claude's $20/month plan

Pro

$60/month

For professionals and avid LLM users.

✓Access to all always-on models

✓Both UI and API access

✓Cancel anytime

✓10x higher rate limits: 1,350 requests every five hours

✓6x higher rate limits than Claude's $100/month plan

✓50% higher rate limits than Claude's $200/month plan

Usage-based

For enterprise users and custom models.

✓UI and API access

✓Always-on models are pay-per-token

✓On-demand models are pay-per-minute

Perfect for individuals just starting out.

✓Access to all always-on models

✓Both UI and API access

✓Cancel anytime

✓Standard rate limits: 135 requests every five hours

✓3x higher rate limits than Claude's $20/month plan

For professionals and avid LLM users.

✓Access to all always-on models

✓Both UI and API access

✓Cancel anytime

✓10x higher rate limits: 1,350 requests every five hours

✓6x higher rate limits than Claude's $100/month plan

✓50% higher rate limits than Claude's $200/month plan

For enterprise users and custom models.

✓Pay for what you use

✓Both UI and API access

✓Always-on models are pay-per-token

✓On-demand models are pay-per-minute

Always-on models

All always-on models are included in your subscription. There's no additional charge for using any of these models.

All-inclusive pricing

With your subscription, all always-on models are included for one flat monthly price. No per-token billing—just simple, predictable pricing.

Switch to "Pay per Use" to see token-based pricing for when you don't need a subscription.

Here's the list of all always-on models included in your subscription:

Model	Context length	Status
deepseek-ai/DeepSeek-R1	128k tokens	✓ Included
deepseek-ai/DeepSeek-R1-0528	128k tokens	✓ Included
deepseek-ai/DeepSeek-V3	128k tokens	✓ Included
deepseek-ai/DeepSeek-V3-0324	128k tokens	✓ Included
deepseek-ai/DeepSeek-V3.1	128k tokens	✓ Included
deepseek-ai/DeepSeek-V3.1-Terminus	128k tokens	✓ Included
deepseek-ai/DeepSeek-V3.2	159k tokens	✓ Included
meta-llama/Llama-3.1-405B-Instruct	128k tokens	✓ Included
meta-llama/Llama-3.1-70B-Instruct	128k tokens	✓ Included
meta-llama/Llama-3.1-8B-Instruct	128k tokens	✓ Included
meta-llama/Llama-3.3-70B-Instruct	128k tokens	✓ Included
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	524k tokens	✓ Included
meta-llama/Llama-4-Scout-17B-16E-Instruct	328k tokens	✓ Included
MiniMaxAI/MiniMax-M2	192k tokens	✓ Included
moonshotai/Kimi-K2-Instruct	128k tokens	✓ Included
moonshotai/Kimi-K2-Instruct-0905	256k tokens	✓ Included
moonshotai/Kimi-K2-Thinking	256k tokens	✓ Included
openai/gpt-oss-120b	128k tokens	✓ Included
Qwen/Qwen3-235B-A22B-Instruct-2507	256k tokens	✓ Included
Qwen/Qwen3-235B-A22B-Thinking-2507	256k tokens	✓ Included
Qwen/Qwen3-Coder-480B-A35B-Instruct	256k tokens	✓ Included
Qwen/Qwen3-VL-235B-A22B-Instruct	250k tokens	✓ Included
zai-org/GLM-4.5	128k tokens	✓ Included
zai-org/GLM-4.6	198k tokens	✓ Included

LoRA models

What's a LoRA?

Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.

All LoRAs for the following base models are included in your subscription:

Model	Context length	Status
meta-llama/Llama-3.2-1B-Instruct	128k tokens	✓ Included
meta-llama/Llama-3.2-3B-Instruct	128k tokens	✓ Included
meta-llama/Meta-Llama-3.1-8B-Instruct	128k tokens	✓ Included
meta-llama/Meta-Llama-3.1-70B-Instruct	128k tokens	✓ Included

LoRA sizes are measured in "ranks," starting at rank-8; we support up to rank-64 LoRAs kept always-on, and we run them in FP8 precision. The rank is set during the finetuning process: if you create your own LoRA, you'll be able to set exactly what rank you want using standard configuration for your training framework.

For LoRAs of base models not listed in the table above, we support running them on-demand as long as vLLM does; however, since the base models aren't kept always-on, you'll be charged our standard on-demand pricing for the base model (and no additional charge for the LoRA).

On-demand pricing

We support launching all other LLMs on-demand on cloud GPUs. There's no configuration necessary: just enter the Hugging Face link for any model, and we'll automatically run it for you in our friendly chat UI or API.

On-demand models are charged per-minute that the model is running.

Even with a subscription, on-demand models are charged separately per minute.

These models are not included in your subscription and will be billed at standard rates.

We'll automatically detect the number of GPUs you need to run the model.

GPU Type	Up to	Price
24GB	24GB (1 GPU)	1.2 cents/min, per GPU
48GB	192GB (4 GPUs)	1.5 cents/min, per GPU
80GB	640GB (8 GPUs)	3 cents/min, per GPU

Our on-demand GPU rates are very competitive: for example, an 80GB GPU is ~2x cheaper on Synthetic than on competing services like Replicate or Modal Labs.

We automatically calculate the type and number of GPUs required for a model repository for you. We don't quantize on-demand models: they're launched in whatever precision the underlying repo uses; typically, BF16, with the exception of Jamba-based models which are launched in FP8. Quantizing past FP8 can significantly harm model performance.

On-demand model context length is capped to a maximum of 32k tokens.