synthetic

Pricing

There's no subscription required to use Synthetic. Instead, we charge based on usage: if you don't use the product, you don't get charged.

Always-on model pricing

We keep popular open-source models always-on: there's no boot time, they're just ready to go. Always-on models are charged per-token.

Here's the list of our always-on models:

ModelContext lengthInput price (per million tokens)Output price (per million tokens)
deepseek-ai/DeepSeek-R1128k tokens$0.55/mtok$2.19/mtok
deepseek-ai/DeepSeek-R1-0528128k tokens$3.00/mtok$8.00/mtok
deepseek-ai/DeepSeek-R1-Distill-Llama-70B128k tokens$0.90/mtok$0.90/mtok
deepseek-ai/DeepSeek-V3128k tokens$1.25/mtok$1.25/mtok
deepseek-ai/DeepSeek-V3-0324128k tokens$1.20/mtok$1.20/mtok
google/gemma-2-27b-it8k tokens$0.80/mtok$0.80/mtok
meta-llama/Llama-3.1-405B-Instruct128k tokens$3.00/mtok$3.00/mtok
meta-llama/Llama-3.1-70B-Instruct128k tokens$0.90/mtok$0.90/mtok
meta-llama/Llama-3.1-8B-Instruct128k tokens$0.20/mtok$0.20/mtok
meta-llama/Llama-3.2-3B-Instruct128k tokens$0.06/mtok$0.06/mtok
meta-llama/Llama-3.3-70B-Instruct128k tokens$0.90/mtok$0.90/mtok
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8524k tokens$0.22/mtok$0.88/mtok
meta-llama/Llama-4-Scout-17B-16E-Instruct328k tokens$0.15/mtok$0.60/mtok
mistralai/Mistral-7B-Instruct-v0.332k tokens$0.20/mtok$0.20/mtok
mistralai/Mixtral-8x22B-Instruct-v0.164k tokens$1.20/mtok$1.20/mtok
mistralai/Mixtral-8x7B-Instruct-v0.132k tokens$0.60/mtok$0.60/mtok
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF32k tokens$0.90/mtok$0.90/mtok
Qwen/Qwen2.5-72B-Instruct128k tokens$0.90/mtok$0.90/mtok
Qwen/Qwen2.5-7B-Instruct32k tokens$0.18/mtok$0.18/mtok
Qwen/Qwen2.5-Coder-32B-Instruct32k tokens$0.80/mtok$0.80/mtok
Qwen/Qwen3-235B-A22B128k tokens$0.20/mtok$0.60/mtok

LoRA pricing

LoRAs of the following models are always-on, and are charged per-token.

Base modelInput price (per million tokens)Output price (per million tokens)
meta-llama/Llama-3.2-1B-Instruct$0.06/mtok$0.06/mtok
meta-llama/Llama-3.2-3B-Instruct$0.06/mtok$0.06/mtok
meta-llama/Meta-Llama-3.1-8B-Instruct$0.20/mtok$0.20/mtok
meta-llama/Meta-Llama-3.1-70B-Instruct$0.90/mtok$0.90/mtok

LoRA sizes are measured in "ranks," starting at rank-8; we support up to rank-64 LoRAs kept always-on, and we run them in FP8 precision. The rank is set during the finetuning process: if you create your own LoRA, you'll be able to set exactly what rank you want using standard configuration for your training framework.

For LoRAs of base models not listed in the table above, we support running them on-demand as long as vLLM does; however, since the base models aren't kept always-on, you'll be charged our standard on-demand pricing for the base model (and no additional charge for the LoRA).

On-demand pricing

We support launching all other LLMs on-demand on cloud GPUs. There's no configuration necessary: just enter the Hugging Face  link for any model, and we'll automatically run it for you in our friendly chat UI or API.

We'll automatically detect the number of GPUs you need to run the model. Here's our current GPU pricing:

GPU TypePrice
80GB3. cents/min, per GPU
48GB1.5 cents/min, per GPU
24GB1.2 cents/min, per GPU

Our on-demand GPU rates are very competitive: for example, an 80GB GPU is ~2x cheaper on Synthetic than on competing services like Replicate or Modal Labs.

We automatically calculate the type and number of GPUs required for a model repository for you. We don't quantize on-demand models: they're launched in whatever precision the underlying repo uses; typically, BF16, with the exception of Jamba-based models which are launched in FP8. Quantizing past FP8 can significantly harm model performance.

On-demand model context length is capped to a maximum of 32k tokens.